From b81a52e69d77d69c35e5a1379e77c87b85dcd543 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:30:33 +0900 Subject: [PATCH 01/16] feat(skill): add fessctl SKILL.md skeleton --- skills/fessctl/SKILL.md | 70 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 skills/fessctl/SKILL.md diff --git a/skills/fessctl/SKILL.md b/skills/fessctl/SKILL.md new file mode 100644 index 0000000..ba03448 --- /dev/null +++ b/skills/fessctl/SKILL.md @@ -0,0 +1,70 @@ +--- +name: fessctl +description: Operate Fess via fessctl CLI (admin API client). Use when managing webconfigs, file/data configs, schedulers, users/roles/groups, dictionaries, access tokens, or any Fess admin resource from Claude. Covers CRUD on 22 resource types and cross-feature workflows like initial crawl setup. +license: Apache-2.0 +version: 0.1.0 +--- + +# fessctl + +`fessctl` is the Python CLI for the Fess admin REST API. This skill teaches Claude to detect/run fessctl, authenticate, and operate every supported resource. + +## Detection (run in this order) + +1. `command -v fessctl` → use it directly +2. `$FESS_WORKSPACE/repos/fessctl` exists AND `command -v uv` → `cd $FESS_WORKSPACE/repos/fessctl && uv run fessctl` +3. Fall back to `docker run --rm -e FESS_ENDPOINT -e FESS_ACCESS_TOKEN -e FESS_VERSION ghcr.io/codelibs/fessctl:` + +See `references/installation.md` for the exact wrappers. + +## Required environment + +- `FESS_ENDPOINT` (default `http://localhost:8080`) +- `FESS_ACCESS_TOKEN` (required for any non-`ping` call) +- `FESS_VERSION` (e.g. `15.6.0`; must match the target Fess server) + +See `references/authentication.md` for token issuance. + +## Always do this first + +Before invoking any subcommand the assistant has not seen recently, run `fessctl --help` to confirm the current option surface — fessctl evolves with Fess and `--help` is the source of truth. + +## Index — common references + +| Topic | File | +|-------|------| +| Install / detection / wrappers | references/installation.md | +| Auth, tokens, env vars | references/authentication.md | +| Output formats (JSON/YAML/MD) | references/output-formats.md | +| CRUD conventions, IDs, paging | references/conventions.md | +| Common errors and recovery | references/troubleshooting.md | +| Multi-feature recipes | references/workflows.md | + +## Resources — per-feature reference + +Each file documents one Fess admin feature: what it is, when to use it, fessctl subcommand surface, JSON shape, gotchas, and examples. + +| Feature | File | +|---------|------| +| Web crawl configs | references/features/webconfig.md | +| File crawl configs | references/features/fileconfig.md | +| Datastore configs | references/features/dataconfig.md | +| Web auth credentials | references/features/webauth.md | +| File auth credentials | references/features/fileauth.md | +| Crawl scheduler / jobs | references/features/scheduler.md | +| Job logs | references/features/joblog.md | +| Crawling info | references/features/crawlinginfo.md | +| Users | references/features/user.md | +| Roles | references/features/role.md | +| Groups | references/features/group.md | +| Access tokens | references/features/accesstoken.md | +| Label types | references/features/labeltype.md | +| Key match | references/features/keymatch.md | +| Boost document | references/features/boostdoc.md | +| Elevate word | references/features/elevateword.md | +| Bad word | references/features/badword.md | +| Related content | references/features/relatedcontent.md | +| Related query | references/features/relatedquery.md | +| Path mapping | references/features/pathmap.md | +| Duplicate host | references/features/duplicatehost.md | +| Request header | references/features/reqheader.md | From 7006e1b7329533506bb3e4710b126fb8568bf191 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 02/16] feat(skill): add installation reference --- skills/fessctl/references/installation.md | 95 +++++++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 skills/fessctl/references/installation.md diff --git a/skills/fessctl/references/installation.md b/skills/fessctl/references/installation.md new file mode 100644 index 0000000..2cdb03c --- /dev/null +++ b/skills/fessctl/references/installation.md @@ -0,0 +1,95 @@ +# Installing & Invoking fessctl + +`fessctl` is delivered as a Python package and a published Docker image. The skill picks one of three runners depending on what is available locally. + +## Detection chain + +Resolve the runner once at the start of a session. Use the first branch that succeeds: + +```bash +resolve_fessctl() { + if command -v fessctl >/dev/null 2>&1; then + FESSCTL="fessctl" + return + fi + if [[ -d "${FESS_WORKSPACE:-$PWD}/repos/fessctl" ]] && command -v uv >/dev/null 2>&1; then + FESSCTL="uv --directory ${FESS_WORKSPACE:-$PWD}/repos/fessctl run fessctl" + return + fi + FESSCTL="docker run --rm \ + -e FESS_ENDPOINT -e FESS_ACCESS_TOKEN -e FESS_VERSION \ + --add-host=host.docker.internal:host-gateway \ + ghcr.io/codelibs/fessctl:${FESS_VERSION:-latest}" +} + +resolve_fessctl +$FESSCTL ping +``` + +The order matters: a system-PATH `fessctl` (e.g. installed via `pipx` or any future package manager) is the fastest invocation, followed by an in-tree `uv run` against `repos/fessctl`, with the published Docker image as the universal fallback. + +## Option A — system PATH install + +Recommended for end users who do not have a `fess-workspace` checkout. + +```bash +pipx install fessctl +fessctl --help +``` + +`pipx` is preferred because it isolates fessctl in its own virtualenv. As of v0.1.0 the project is also installable from source: + +```bash +git clone https://github.com/codelibs/fessctl.git +cd fessctl +uv pip install -e src +``` + +After either install, `command -v fessctl` should print a path on `$PATH` and the detection chain will pick this branch. + +## Option B — fess-workspace dev mode + +Use this when you are actively editing fessctl source inside a `fess-workspace` clone. Local edits are picked up on the next invocation. + +```bash +cd $FESS_WORKSPACE/repos/fessctl +uv sync +uv run fessctl --help +``` + +`uv sync` only needs to run when `pyproject.toml` or `uv.lock` changes; subsequent calls reuse the cached environment. This branch is what the detection chain selects when `repos/fessctl` exists and `uv` is on `$PATH`. + +## Option C — Docker + +Use this when neither a PATH install nor a fess-workspace clone is available. + +```bash +docker run --rm \ + -e FESS_ENDPOINT="$FESS_ENDPOINT" \ + -e FESS_ACCESS_TOKEN="$FESS_ACCESS_TOKEN" \ + -e FESS_VERSION="$FESS_VERSION" \ + --add-host=host.docker.internal:host-gateway \ + ghcr.io/codelibs/fessctl:0.1.0 \ + ping +``` + +Two networking notes for reaching a Fess server running on the **host**: + +- macOS / Windows Docker Desktop: the `--add-host=host.docker.internal:host-gateway` flag lets the container resolve `host.docker.internal`. Set `FESS_ENDPOINT=http://host.docker.internal:8080` for a host-mode Fess. +- Linux: prefer `--network host` and keep `FESS_ENDPOINT=http://localhost:8080`. + +## Choosing the Docker tag + +The Docker image is published at `ghcr.io/codelibs/fessctl`. Pin a tag rather than `latest` for reproducible runs. The convention is to keep the image tag close to the Fess version it has been validated against — if you are talking to a Fess 15.6 server, prefer the tag whose `FESS_VERSION` default matches. Inspect available tags at if unsure. + +## Verifying the install + +Regardless of the branch chosen, the canonical smoke test is: + +```bash +$FESSCTL ping +``` + +Expected: a success message indicating the endpoint is reachable. If this fails, see `references/troubleshooting.md` — the most common causes are an unset `FESS_ENDPOINT`, an unreachable Fess server, or (for Docker) the host-network gotchas above. + +`ping` does not require `FESS_ACCESS_TOKEN`; the next test, `$FESSCTL user list --size 1`, does. Once both succeed you are wired up end-to-end. From 5ebdd999a5d9fb66a4306fd9837276c29bddc756 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 03/16] feat(skill): add authentication reference --- skills/fessctl/references/authentication.md | 81 +++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 skills/fessctl/references/authentication.md diff --git a/skills/fessctl/references/authentication.md b/skills/fessctl/references/authentication.md new file mode 100644 index 0000000..8383e86 --- /dev/null +++ b/skills/fessctl/references/authentication.md @@ -0,0 +1,81 @@ +# Authenticating with Fess + +Every fessctl call other than `ping` is authenticated with a Fess access token sent as a Bearer header. This file covers the three required environment variables, how to issue a token, where to keep it, and how to verify the wiring. + +## Required environment variables + +| Variable | Required | Default | Notes | +|----------|----------|---------|-------| +| `FESS_ENDPOINT` | optional | `http://localhost:8080` | Base URL of the target Fess server. Include scheme; do not include a trailing `/`. | +| `FESS_ACCESS_TOKEN` | **yes** for any non-`ping` call | none | Bearer token issued from the Fess admin UI or via `fessctl accesstoken create`. | +| `FESS_VERSION` | optional | `15.4.0` (as of fessctl 0.1.0) | Must match the major.minor of the target Fess server so request shapes line up. Set it explicitly — do not rely on the default. | + +Defaults live in `src/fessctl/config/settings.py`. The defaults are conservative and may lag the latest Fess release; for any non-trivial work, set `FESS_ENDPOINT` and `FESS_VERSION` explicitly. + +## Issuing an access token + +### Via the admin UI + +1. Browse to `${FESS_ENDPOINT}/admin/` and sign in with an admin account. +2. Open **System → Access Token**. +3. Click **Create New**, give the token a name (e.g. `claude-cli`), and select the permissions it needs. For most fessctl operations the `Radmin-api` permission is required. +4. Save and copy the generated token value. It is shown only once. + +### Via fessctl (after you already have one admin token) + +```bash +fessctl accesstoken create \ + --name claude-cli \ + --permissions "Radmin-api" +``` + +See `references/features/accesstoken.md` for the full subcommand surface (list, get, update, delete). + +## Where to put the token + +Pick the option that matches how you run fessctl. + +- **direnv (`.envrc` per project):** + ```bash + export FESS_ENDPOINT=http://localhost:8080 + export FESS_ACCESS_TOKEN=eyJhbGciOi... + export FESS_VERSION=15.6.0 + ``` + Add `.envrc` to `.gitignore`. Run `direnv allow` to activate. + +- **Shell rc (`~/.zshrc`, `~/.bashrc`):** acceptable for personal machines, but prefer per-project `.envrc` so different Fess environments do not collide. + +- **GitHub Codespaces / Actions:** store as encrypted secrets and inject via `env:` in the workflow. + +- **Docker invocation:** pass `-e FESS_ACCESS_TOKEN` (the host environment variable is forwarded to the container — do **not** put the token on the command line where it would land in shell history). + +Never commit a token to git, and never include it in chat output, log files, or documentation examples. + +## Token scope and expiry + +Fess access tokens are bearer tokens. The token's permissions are fixed at creation time; rotating permissions means issuing a new token and deleting the old one. Tokens do not auto-rotate. If a token leaks, delete it immediately via: + +```bash +fessctl accesstoken list --output json | jq '.[] | select(.name=="claude-cli")' +fessctl accesstoken delete --id +``` + +Most fessctl subcommands require admin-equivalent permission. A token issued for a non-admin user will succeed at `ping` and may succeed at some read-only `list` operations but will fail with `403 Forbidden` on `create`/`update`/`delete`. + +## Smoke test + +The canonical wired-up check is two commands: + +```bash +fessctl ping # no token required +fessctl user list --size 1 # token required, admin permission required +``` + +Expected: + +- `ping` succeeds when `FESS_ENDPOINT` is reachable. +- `user list --size 1` succeeds when both `FESS_ACCESS_TOKEN` and `FESS_VERSION` are correct and the token has admin permission. + +## Common 401 / 403 causes + +If `user list` fails after `ping` succeeds, the token is the problem. See `references/troubleshooting.md` for the full diagnostic flow; the short list is: token unset, token typo, token expired or revoked, or token issued without `Radmin-api` permission. From f4cf76d7318ca9748f1dbae2263ae78a4b9f5ecc Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 04/16] feat(skill): add output-formats reference --- skills/fessctl/references/output-formats.md | 58 +++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 skills/fessctl/references/output-formats.md diff --git a/skills/fessctl/references/output-formats.md b/skills/fessctl/references/output-formats.md new file mode 100644 index 0000000..ac718e1 --- /dev/null +++ b/skills/fessctl/references/output-formats.md @@ -0,0 +1,58 @@ +# Output formats + +Every fessctl subcommand that returns data accepts `--output` (or `-o`) to choose between three serializations. + +## Available formats + +| Format | Best for | Notes | +|------------|----------|-------| +| `markdown` | Reading in chat or a terminal | Human-friendly tables and headings. **Do not parse it programmatically — column layout is presentational and may change between releases.** | +| `json` | Piping to `jq`, scripting | Stable, machine-readable. Top-level shape is `{"success": ..., "data": ...}` for action results and an array for `list`. Always start here when chaining commands. | +| `yaml` | Hand-editing, version control diffs | Useful for capturing settings to a file you intend to read or edit by eye. | + +The exact set of supported values is implemented in `src/fessctl/utils.py`; if you see a format mentioned in `--help` that is not listed here, prefer `--help` as the source of truth. + +## When to use which + +- **Asking Claude to summarize a list of resources** → `markdown` (concise to read). +- **Filtering a list before acting** → `json | jq`. +- **Saving config for review or backup** → `yaml`. +- **Anything that another shell command will consume** → `json` and never `markdown`. + +## Idiomatic pipelines + +```bash +# 1. List then filter to high-boost crawl configs +fessctl webconfig list --output json | jq '.[] | select(.boost > 1.0)' + +# 2. Extract one field across many rows +fessctl user list --output json | jq -r '.[].name' + +# 3. Capture config to YAML for review +fessctl webconfig get --id --output yaml > webconfig-snapshot.yaml + +# 4. Save a list as a baseline for diffing later +fessctl scheduler list --output json > scheduler-before.json +# ... make changes ... +fessctl scheduler list --output json > scheduler-after.json +diff <(jq -S . scheduler-before.json) <(jq -S . scheduler-after.json) + +# 5. Bulk delete by filter +fessctl badword list --output json \ + | jq -r '.[] | select(.suggest_word | test("^_test")) | .id' \ + | xargs -I{} fessctl badword delete --id {} +``` + +`fessctl` does not currently accept a `--from-file` style input flag (verified in `src/fessctl/commands/`). To re-create a resource from a captured YAML, read it back yourself and re-issue `create`/`update` with the appropriate flags — for example: + +```bash +yq '.data | "--name=" + .name + " --url=" + .urls' webconfig-snapshot.yaml +``` + +Then call `fessctl webconfig create` with those flags. If you need true round-trip restore, use multiple commands; do not assume a single-flag import path exists. + +## Caveats + +- `markdown` output is rendered for humans only. Scripts that grep markdown will break the next time the table layout shifts. +- `json` field names track the Fess admin API directly, so they may differ subtly between Fess versions. Pin `FESS_VERSION` and rerun if you see unexpected keys. +- `yaml` output preserves nesting from the underlying API but does not include comments. If you serialize and then re-import, expect identifier and timestamp fields to need stripping. From 1050399fcf34b51d4fc596cfa97116fc55e71e51 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 05/16] feat(skill): add conventions reference --- skills/fessctl/references/conventions.md | 87 ++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 skills/fessctl/references/conventions.md diff --git a/skills/fessctl/references/conventions.md b/skills/fessctl/references/conventions.md new file mode 100644 index 0000000..33355a5 --- /dev/null +++ b/skills/fessctl/references/conventions.md @@ -0,0 +1,87 @@ +# fessctl conventions + +Patterns that hold across most fessctl subcommands. Each per-feature reference notes deviations. + +## CRUD verb pattern + +Every resource type exposes a uniform set of verbs: + +``` +fessctl list # paginated list +fessctl get # one resource by --id +fessctl create # new resource, fields via flags +fessctl update # mutate one resource by --id +fessctl delete # remove one resource by --id +``` + +Exceptions: + +- **`crawlinginfo`** is effectively read-only history: it exposes only `list`, `get`, `delete`. There is no `create` or `update`. Verify with `grep '@crawlinginfo_app.command' src/fessctl/commands/crawlinginfo.py` if in doubt. +- The top-level `fessctl ping` is not part of any resource group; it is the smoke test for endpoint reachability. + +When in doubt, run `fessctl --help` to see the verbs offered for a particular resource, then `fessctl --help` for the flag list. + +## Identifiers + +Most resources use opaque Fess-internal IDs (e.g. ULID-like strings). Treat them as opaque — they are not URLs, not human-readable, and may differ between environments after an export/import. + +- `get`, `update`, `delete` always require `--id`. +- IDs are discovered via `list`. The `id` field is included in `list --output json` output. +- Do not hard-code IDs across environments. After importing settings into a new Fess server, IDs will be regenerated. + +## Pagination + +`list` is paginated with two flags (verified in `webconfig.py`): + +| Flag | Default | Notes | +|------|---------|-------| +| `--page` / `-p` | `1` | 1-indexed page number. | +| `--size` / `-s` | `100` | Page size. | + +A single `list` call returns one page. To enumerate all rows, increment `--page` until an empty page is returned. There is no `--all` shortcut at v0.1.0. + +## Required vs optional fields + +Fields are exposed as Typer options. Required fields are marked with `...` (Typer's required sentinel) in the source; running `fessctl --help` shows them with no default value, while optional fields show their default. The most reliable check is always `--help`. Do not infer requiredness from the JSON shape alone — Fess server-side validation is stricter than what fessctl encodes. + +## Idempotency + +`create` is **not** idempotent. Re-running a `create` with the same logical name will produce a duplicate row. Patterns to emulate upsert: + +```bash +# Look up first, decide to create or update +existing_id=$(fessctl webconfig list --output json | jq -r '.[] | select(.name=="foo") | .id') +if [[ -z "$existing_id" ]]; then + fessctl webconfig create --name foo ... +else + fessctl webconfig update --id "$existing_id" ... +fi +``` + +`delete` is idempotent in spirit (deleting a missing ID yields a 404, but the end state is the same). + +## Bulk operations + +fessctl does not provide native bulk verbs. Compose with shell: + +```bash +fessctl badword list --output json \ + | jq -r '.[].id' \ + | xargs -I{} fessctl badword delete --id {} +``` + +When deleting in bulk, capture a backup first (`fessctl list --output yaml > before.yaml`) so the operation is reversible. + +## Cross-resource dependencies + +A high-level dependency map (each per-feature file restates the relevant edges): + +- `webconfig` → references `labeltype`, `webauth` by ID/name. +- `fileconfig` → references `labeltype`, `fileauth`. +- `dataconfig` → references `labeltype`. +- `scheduler` → triggers a `webconfig` / `fileconfig` / `dataconfig` crawl; produces `joblog` rows; populates `crawlinginfo` history. +- `user` → references `role`, `group`. +- `keymatch`, `boostdoc`, `elevateword`, `badword`, `relatedcontent`, `relatedquery` → tune search-time behavior; depend on indexed documents existing. +- `accesstoken` → grants permissions for fessctl itself; deleting your own active token logs you out. + +When deleting, work outward from leaves: delete `user`s before the `role`s they reference, delete `webconfig`s before the `labeltype`s they cite (or accept that Fess will tolerate dangling references but log warnings). From 266e63b0a6303e5d8a2d386b40eb66469599f6c1 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 06/16] feat(skill): add troubleshooting reference --- skills/fessctl/references/troubleshooting.md | 140 +++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 skills/fessctl/references/troubleshooting.md diff --git a/skills/fessctl/references/troubleshooting.md b/skills/fessctl/references/troubleshooting.md new file mode 100644 index 0000000..6fbc8e1 --- /dev/null +++ b/skills/fessctl/references/troubleshooting.md @@ -0,0 +1,140 @@ +# Troubleshooting + +Symptoms and recoveries for the errors fessctl users hit most often. Each section starts with what the user sees, then likely causes and fixes. + +## `401 Unauthorized` / `403 Forbidden` + +``` +HTTP 401: Unauthorized +``` + +Causes: + +- `FESS_ACCESS_TOKEN` is unset, mistyped, or has been deleted in the admin UI. +- The token was issued for a non-admin user; most fessctl operations require admin (`Radmin-api`) permission. +- You exported the token in one shell but are running fessctl from another shell where the env var was not inherited. + +Recovery: + +```bash +echo "${FESS_ACCESS_TOKEN:-UNSET}" # 1. confirm the env var is exported +fessctl ping # 2. ping does NOT need a token; if this works the endpoint is reachable +fessctl user list --size 1 # 3. this DOES need a token +# If still 401: re-issue +fessctl accesstoken create --name claude-cli --permissions "Radmin-api" +export FESS_ACCESS_TOKEN= +``` + +## `404 Not Found` + +``` +HTTP 404: Not Found +``` + +Causes: + +- The `--id` you passed does not exist in this Fess server (typo, copy-pasted from a different env, or the resource was deleted). +- `FESS_ENDPOINT` points at the wrong server. +- A previous `delete` ran successfully and the ID is now gone. + +Recovery: + +```bash +fessctl list --output json | jq '.[] | {id,name}' +``` + +Find the live ID, retry. Resource IDs do not survive an export/import — see `references/conventions.md`. + +## `Connection refused` / DNS failure + +``` +httpx.ConnectError: [Errno 61] Connection refused +``` + +Causes: + +- Fess server is not running, or is not listening on the port in `FESS_ENDPOINT`. +- You are running fessctl inside Docker but `FESS_ENDPOINT=http://localhost:8080` resolves to the *container's* loopback, not the host. +- The endpoint URL is missing the scheme (`https://...` vs `host:port`). + +Recovery: + +```bash +curl -fsS "${FESS_ENDPOINT}/" >/dev/null && echo OK || echo "endpoint unreachable" +``` + +For Docker → host: + +- macOS / Windows: invoke with `--add-host=host.docker.internal:host-gateway` and set `FESS_ENDPOINT=http://host.docker.internal:8080`. +- Linux: pass `--network host` and keep `FESS_ENDPOINT=http://localhost:8080`. + +## API version mismatch + +Symptoms vary: missing fields, unexpected `400 Bad Request`, or `KeyError` when fessctl parses a response. + +Cause: `FESS_VERSION` does not match the running server's major.minor. fessctl shapes some requests/responses by version. + +Recovery: + +```bash +curl -fsS "${FESS_ENDPOINT}/api/v1/health" | jq . # find the running Fess version +export FESS_VERSION= +``` + +If the Fess server is newer than any version fessctl knows about, you may also need to update fessctl itself. + +## SSL / certificate errors + +``` +ssl.SSLCertVerificationError: certificate verify failed +``` + +Cause: Fess is fronted by HTTPS with a self-signed or internal-CA certificate, and the OS / Python trust store does not include that CA. + +Recovery: fessctl uses `httpx` with the system trust store (no dedicated `FESS_VERIFY_SSL` env var exists in v0.1.0 — see `src/fessctl/api/client.py`). The pragmatic fixes are: + +- Add the issuing CA to the system trust store (`/etc/ssl/certs`, macOS Keychain, Windows certificate manager). +- For local dev only, terminate TLS at a reverse proxy and point fessctl at the plain-HTTP backend. + +Do not edit fessctl source to disable verification. If you genuinely need to, raise an issue upstream rather than carrying a local patch. + +## Empty `list` results when data should exist + +Causes: + +- `--page` advanced past the last page. +- A filter combination upstream of fessctl (e.g. shell quoting issues) is hiding rows. + +Recovery: + +```bash +fessctl list --page 1 --size 100 --output json | jq 'length' +``` + +If `length` is 0, the resource really is empty in this environment. Confirm you are pointed at the right `FESS_ENDPOINT`. + +## `uv run fessctl` slow on the first invocation + +Cause: cold virtualenv build inside `repos/fessctl/.venv`. + +Recovery: run `uv sync` once up front in `repos/fessctl`. Subsequent `uv run fessctl ...` calls reuse the cached environment and start in well under a second. + +## Docker pull or auth errors against `ghcr.io` + +``` +denied: requested access to the resource is denied +``` + +Causes: + +- Rate limiting from anonymous pulls. +- Pulling from a private mirror without prior `docker login`. + +Recovery: + +```bash +docker login ghcr.io # if behind auth +docker pull ghcr.io/codelibs/fessctl:0.1.0 # pin a specific tag, not `latest` +``` + +If `latest` is unavailable in your environment, prefer a pinned version tag matching `FESS_VERSION`. From dc2e705b5abed7ae77472b8bcc7599fd94492b82 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:34:31 +0900 Subject: [PATCH 07/16] feat(skill): add workflows reference --- skills/fessctl/references/workflows.md | 170 +++++++++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 skills/fessctl/references/workflows.md diff --git a/skills/fessctl/references/workflows.md b/skills/fessctl/references/workflows.md new file mode 100644 index 0000000..fd792b2 --- /dev/null +++ b/skills/fessctl/references/workflows.md @@ -0,0 +1,170 @@ +# Multi-feature recipes + +Each per-feature reference covers one resource. This file collects the small number of operations that genuinely span multiple features and need a defined order. For day-to-day single-resource CRUD, prefer the per-feature file. + +All recipes assume `FESS_ENDPOINT`, `FESS_ACCESS_TOKEN`, and `FESS_VERSION` are exported. See `references/authentication.md`. + +--- + +## Recipe 1 — Stand up a fresh web crawl + +**Goal:** Take a brand-new Fess server from "no crawl configs" to "crawled and searchable" using only fessctl. + +**Prerequisites:** Fess is running, an admin token exists, target URL is reachable. + +```bash +# 1. Smoke test +fessctl ping + +# 2. (Optional) Auth credentials for the target site +# Skip this step if the target is public. +fessctl webauth create --name corp-portal \ + --hostname portal.example.com \ + --username svc-crawler \ + --password '' + +# 3. (Optional) Label so search results can be filtered later +fessctl labeltype create --name engineering --value Engineering + +# 4. The crawl config itself +fessctl webconfig create \ + --name corp-portal-docs \ + --url "https://portal.example.com/docs/" \ + --included-urls "https://portal.example.com/docs/.*" \ + --boost 1.0 + +# 5. Find the scheduler job that runs web crawls +fessctl scheduler list --output json \ + | jq '.[] | select(.name | test("WebCrawler"; "i")) | {id,name,available}' + +# 6. Trigger that job (subcommand may vary; confirm with --help) +fessctl scheduler --help + +# 7. Confirm the crawl started and finished +fessctl joblog list --output json | jq '.[0]' +``` + +**Verify:** the matching `joblog` entry transitions to a non-error terminal state, and `crawlinginfo list` shows a session for the new config. + +--- + +## Recipe 2 — Export settings from one environment, import into another + +**Goal:** Move crawl/auth/label configuration from `fess-staging` to `fess-prod`. + +**Prerequisites:** admin tokens for both environments; fields like IDs and timestamps will be regenerated on the destination. + +```bash +# 1. Export from source +export FESS_ENDPOINT=https://fess-staging.example.com +export FESS_ACCESS_TOKEN=$STAGING_TOKEN +mkdir -p ./fess-export +for r in webauth fileauth labeltype webconfig fileconfig dataconfig pathmap; do + fessctl "$r" list --output yaml > "./fess-export/${r}.yaml" +done + +# 2. Switch to destination +export FESS_ENDPOINT=https://fess-prod.example.com +export FESS_ACCESS_TOKEN=$PROD_TOKEN + +# 3. Re-create each resource on the destination (no native --from-file at v0.1.0; +# you must read each YAML and reissue create with explicit flags). For example: +yq '.data[] | "fessctl labeltype create --name=" + .name + " --value=" + .value' \ + ./fess-export/labeltype.yaml | sh +``` + +**Verify:** `fessctl list` on the destination matches what you expected. Diff the YAML exports if the migration is reversible. Re-creating dependents (webconfig depends on labeltype/webauth) requires that you reload IDs after the prerequisite resources land. + +--- + +## Recipe 3 — Investigate a failing crawl + +**Goal:** Find out why a scheduled crawl is producing errors or no documents. + +```bash +# 1. Which job is failing? +fessctl scheduler list --output json \ + | jq '.[] | select(.available == true)' \ + | head + +# 2. Most recent job runs and their statuses +fessctl joblog list --output json \ + | jq '.[] | {id, jobName, jobStatus, scriptResult, startTime, endTime}' \ + | head -40 + +# 3. For a specific failing run, fetch the full log +fessctl joblog get --id --output yaml + +# 4. Look at the crawl session metadata +fessctl crawlinginfo list --output json \ + | jq '.[] | {sessionId, name, createdTime, expiredTime}' \ + | head + +fessctl crawlinginfo get --id +``` + +If the joblog points at HTTP errors, suspect the target site or `webauth`. If the joblog points at index errors, suspect Fess/OpenSearch health (out of skill scope). For environment-level errors (401, connection refused) see `references/troubleshooting.md`. + +--- + +## Recipe 4 — Provision a user with admin permissions + +**Goal:** Create the role and group structure first, then attach a user. Order matters so the user references existing role/group IDs. + +```bash +# 1. Role (or reuse an existing one) +fessctl role list --output json | jq -r '.[].name' +fessctl role create --name search-admin + +# 2. Group (optional; useful for organizing users) +fessctl group create --name platform-team + +# 3. The user, tied to the role and group +fessctl user create \ + --name alice \ + --password '' \ + --roles search-admin \ + --groups platform-team + +# 4. Verify +fessctl user list --output json | jq '.[] | select(.name=="alice")' +``` + +**Caution:** `delete` order is the reverse — drop users before deleting the role/group they reference, otherwise Fess keeps dangling references. + +--- + +## Recipe 5 — Tune search relevance for one query + +**Goal:** Ensure that searches for "release notes" surface a curated document and demote noisy hits. + +```bash +# 1. Boost any document whose URL matches a pattern +fessctl boostdoc create \ + --url-expr 'url:"https://docs.example.com/release/*"' \ + --boost-expr '2.0' + +# 2. Pin a specific document for a specific query term +fessctl keymatch create \ + --term 'release notes' \ + --query 'url:"https://docs.example.com/release/latest"' \ + --max-size 10 \ + --boost 100.0 + +# 3. Suggest a related query when the user types a near-term +fessctl elevateword create --suggest-word "release notes" --boost 100.0 +fessctl relatedquery create \ + --term 'release notes' \ + --queries '["changelog","what is new"]' + +# 4. Filter out noise +fessctl badword create --suggest-word "internal-test" +``` + +**Caution:** Several of these tunings take effect only after the suggest cache and/or search index are reloaded. If changes do not appear in search results, trigger the relevant scheduler job (see Recipe 1, step 5). + +--- + +## Out of scope (today) + +These recipes intentionally restrict themselves to features fessctl currently wraps. Operations that depend on `failureurl`, `plugin`, `backup`, `searchlist`, `stats`, `storage`, `suggest`, `systeminfo`, or `documents` are not represented here because fessctl 0.1.0 does not expose those subcommands. When fessctl adds them, update this file with the additional recipes. From 0b09bd3c56e6ab3bbd8e88fdab67938bf74b1055 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:37:15 +0900 Subject: [PATCH 08/16] feat(skill): add feature references batch 1 (webconfig, fileconfig, dataconfig, scheduler, user) --- .../fessctl/references/features/dataconfig.md | 103 +++++++++++++++++ .../fessctl/references/features/fileconfig.md | 102 ++++++++++++++++ .../fessctl/references/features/scheduler.md | 109 ++++++++++++++++++ skills/fessctl/references/features/user.md | 104 +++++++++++++++++ .../fessctl/references/features/webconfig.md | 103 +++++++++++++++++ 5 files changed, 521 insertions(+) create mode 100644 skills/fessctl/references/features/dataconfig.md create mode 100644 skills/fessctl/references/features/fileconfig.md create mode 100644 skills/fessctl/references/features/scheduler.md create mode 100644 skills/fessctl/references/features/user.md create mode 100644 skills/fessctl/references/features/webconfig.md diff --git a/skills/fessctl/references/features/dataconfig.md b/skills/fessctl/references/features/dataconfig.md new file mode 100644 index 0000000..f24a8db --- /dev/null +++ b/skills/fessctl/references/features/dataconfig.md @@ -0,0 +1,103 @@ +# Datastore Configs (fessctl `dataconfig`) + +## What it is + +Fess supports crawling content from data sources that are not plain web pages or files — relational databases, CSV/TSV files, SaaS APIs (Slack, SharePoint), Elasticsearch indexes, and more. Each such source is configured as a **Data Store** crawl, which pairs a **handler** (the connector implementation) with a free-form **parameter** block and a **Groovy script** that maps source records to Fess document fields. + +In the admin UI this lives at **Crawler > Data Store** (`/admin/dataconfig/`). From there you can create, edit, and delete data store crawl configurations. Each entry has a name, a handler name (e.g. `DatabaseDataStore`, `CsvDataStore`, `EsDataStore`, `CsvListDataStore`, or one provided by an installed `fess-ds-*` plugin), parameters, a script, boost, permissions, virtual hosts, sort order, status, and description. + +In fessctl terminology, "datastore" specifically refers to a `fess-ds-*` connector or a built-in handler. The actual ingestion is triggered by a scheduler job that invokes the configured handler against the configured parameters. + +## When to use + +- Crawl a relational database (MySQL, PostgreSQL, Oracle) using `DatabaseDataStore` and a JDBC driver placed in `app/WEB-INF/lib`. +- Index CSV/TSV files in bulk via `CsvDataStore`, mapping each row to a Fess document. +- Crawl a large file tree incrementally with `CsvListDataStore`, where the CSV lists changed paths and actions (create/modify/delete). +- Ingest content from a SaaS source via an installed plugin — e.g. `fess-ds-slack`, `fess-ds-sharepoint`, `fess-ds-confluence`, `fess-ds-s3`. Each plugin registers its own handler name. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments | +|---|---|---| +| `create` | Create a new DataConfig | `--name`, `--handler-name`, `--boost`, `--available`, `--sort-order`, `--description`, `--handler-parameter`, `--handler-script`, `--permission` (repeatable), `--virtual-host` (repeatable), `--created-by`, `--created-time`, `--output` | +| `update` | Update an existing DataConfig by ID | `config_id` (positional); same flags as `create`, all optional; plus `--updated-by`, `--updated-time`, `--output` | +| `delete` | Delete a DataConfig by ID | `config_id` (positional), `--output` | +| `get` | Retrieve one DataConfig by ID | `config_id` (positional), `--output` | +| `list` | List DataConfigs (paginated) | `--page`/`-p`, `--size`/`-s`, `--output`/`-o` | + +Always reconfirm with `fessctl dataconfig --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "name": "products-db", + "handler_name": "DatabaseDataStore", + "handler_parameter": "driver=com.mysql.cj.jdbc.Driver\nurl=jdbc:mysql://db.example.com:3306/shop?useUnicode=true&characterEncoding=UTF-8\nusername=fess\npassword=secret\nsql=select id,title,body,updated_at from products", + "handler_script": "url=\"https://shop.example.com/p/\" + id\nhost=\"shop.example.com\"\nsite=\"shop.example.com\"\ntitle=title\ncontent=body\ndigest=body\ncontent_length=body.length()\nlast_modified=updated_at", + "boost": 1.0, + "available": "true", + "sort_order": 1, + "description": "Product catalog from MySQL", + "permissions": "{role}guest", + "virtual_hosts": "", + "created_by": "admin", + "created_time": 1735689600000 +} +``` + +Required on create: `name`, `handler_name`. Recommended in practice: `handler_parameter` (newline-delimited `key=value`) and `handler_script` (Groovy, also `key=value` per line). Optional with defaults: `boost` = `1.0`, `available` = `"true"`, `sort_order` = `1`, `description` = `""`, `permissions` = `"{role}guest"` (joined by newlines if multiple `--permission` flags), `virtual_hosts` = `""` (joined by newlines), `created_by` = `"admin"`, `created_time` = current epoch milliseconds. On update, `crud_mode` is set to `2`, plus `updated_by` and `updated_time`. + +## Relationships + +- **labeltype** — assign labels through the script or via permissions to let users filter ingested documents in the UI. +- **scheduler** — actual crawling is triggered by a scheduler job (typically the `Default Crawler` job, or a job whose script calls the data-store handler). Without an enabled scheduler job, a DataConfig is inert. +- **fess-ds-\*** plugin — non-built-in `handler_name` values require the corresponding plugin to be installed on the Fess server (see `fessctl plugin install`). The handler name in the config must match the plugin's registered name exactly. +- **JDBC driver / native libs** — `DatabaseDataStore` needs the matching JDBC `.jar` in `app/WEB-INF/lib` and a server restart to be picked up. +- **virtualhost** — controls which virtual host the produced documents are visible under. + +## Gotchas + +- The `handler_name` must match a class or plugin currently loaded by Fess. Mistyped names produce an error only at crawl time, not at create time. +- For databases, the JDBC driver class name varies by version. MySQL 8 uses `com.mysql.cj.jdbc.Driver`; MySQL 5.x used `com.mysql.jdbc.Driver`. PostgreSQL uses `org.postgresql.Driver`. +- `handler_parameter` and `handler_script` are newline-delimited `key=value`. Pass them as single strings with `\n` literals quoted by your shell (e.g. `--handler-parameter $'driver=...\nurl=...\n...'`). +- The script is **Groovy**. Strings need double quotes; column names from a SQL row become Groovy variables. Mismatches between SQL column names and script variables silently produce empty fields. +- Slack and similar SaaS plugins require workspace-scoped tokens (`xoxb-...`) with the right OAuth scopes. Store them in `handler_parameter` rather than committing them to source control. +- `permissions` must use the prefixes `{role}`, `{group}`, `{user}` (e.g. `{role}guest`). Bare names will not authorize anyone. +- The `available` field is sent as the string `"true"` or `"false"`, not a boolean. +- Behavior of specific `fess-ds-*` plugins is version-pinned to the Fess release. Always cross-check the plugin's README against your `FESS_VERSION`. + +## Examples + +```bash +# Create a JDBC datastore that crawls a products table. +fessctl dataconfig create \ + --name products-db \ + --handler-name DatabaseDataStore \ + --handler-parameter $'driver=com.mysql.cj.jdbc.Driver\nurl=jdbc:mysql://db.example.com:3306/shop?useUnicode=true&characterEncoding=UTF-8\nusername=fess\npassword=secret\nsql=select id,title,body,updated_at from products' \ + --handler-script $'url="https://shop.example.com/p/" + id\nhost="shop.example.com"\nsite="shop.example.com"\ntitle=title\ncontent=body\ndigest=body\ncontent_length=body.length()\nlast_modified=updated_at' \ + --boost 1.0 \ + --description "Product catalog from MySQL" \ + --permission '{role}guest' +``` + +```bash +# Toggle an existing datastore off and bump its sort order. +fessctl dataconfig update \ + --available false \ + --sort-order 50 \ + --description "Disabled pending schema migration" +``` + +```bash +# List datastores and filter to those whose name starts with "slack-". +fessctl dataconfig list --size 200 --output json \ + | jq '.response.settings[] | select(.name | startswith("slack-")) | {id,name,handler_name,available}' +``` + +## See also + +- fess-docs: en/15.6/admin/dataconfig-guide.rst +- workflows.md: n/a +- Related features: `references/features/labeltype.md`, `references/features/scheduler.md` diff --git a/skills/fessctl/references/features/fileconfig.md b/skills/fessctl/references/features/fileconfig.md new file mode 100644 index 0000000..bca4f71 --- /dev/null +++ b/skills/fessctl/references/features/fileconfig.md @@ -0,0 +1,102 @@ +# File Crawl Configs (fessctl `fileconfig`) + +## What it is +File Crawl Configs define how the Fess crawler walks a file system or remote object store and indexes the documents it finds. Each config carries a starting path (for example `file:/`, `smb://`, `s3://`, `gcs://`), include/exclude regular expressions, depth and concurrency limits, permissions, and arbitrary client parameters that get passed to the underlying crawler client (S3, GCS, SMB, etc.). + +In the Fess admin UI these records are managed under **Crawler > File System**. Authentication for protected shares (SMB, SFTP, S3, GCS credentials, etc.) is stored separately under **Crawler > File Authentication** and joined to a config by hostname/scheme. Indexing is normally executed by the default crawler scheduler job, which only picks up configs whose `available` flag is true. + +The `fessctl fileconfig` subcommands are a one-to-one wrapper over the corresponding `/api/admin/fileconfig` admin endpoints used by that screen. + +## When to use +- Onboarding a new shared folder, S3 bucket, or GCS bucket that should be searchable from Fess. +- Tightening or widening crawl scope by editing include/exclude regex without touching the UI. +- Bulk-disabling a noisy file source by flipping `--available false` ahead of the next scheduled job. +- Exporting an existing file config (`get -o yaml`) so it can be version-controlled or replayed in another environment. + +## Subcommand surface + +| Subcommand | Purpose | Notes | +| --- | --- | --- | +| `create` | Register a new file crawl config. | `--name` and at least one `--path` are required; other options take crawler-friendly defaults (1 thread, 10000 ms interval, depth 1, max 1,000,000 docs, permission `{role}guest`). | +| `update` | Patch an existing config by ID. | Fetches the current record first, applies only the flags you pass, and re-submits with `crud_mode=2`. ID is a positional argument. | +| `delete` | Remove a config by ID. | Hard delete; pair with `get -o yaml` first if you need a backup. | +| `get` | Show a single config in markdown, JSON, or YAML. | Renders all stored fields including `version_no`, `created_time`, and `updated_time` (epoch ms is converted to UTC ISO 8601 in text mode). | +| `list` | List configs with `--page` / `--size` paging. | Default size is 100; text output shows ID and Name only — switch to `-o json` for the full payload. | + +Always reconfirm with `fessctl fileconfig --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "name": "Shared Folder", + "paths": "smb://SERVER/SharedFolder/", + "num_of_thread": 1, // optional, default 1 + "interval_time": 10000, // optional, default 10000 (ms) + "boost": 1.0, // optional, default 1.0 + "available": true, // optional, default true + "sort_order": 1, // optional, default 1 + "description": "", // optional, default "" + "label_type_ids": [], // optional, default [] + "included_paths": "", // optional, newline-joined regex list + "excluded_paths": "", // optional, newline-joined regex list + "included_doc_paths": "", // optional, newline-joined regex list + "excluded_doc_paths": "", // optional, newline-joined regex list + "config_parameter": "", // optional, newline-joined key=value pairs + "depth": 1, // optional, default 1 + "max_access_count": 1000000, // optional, default 1000000 + "permissions": "{role}guest", // optional, default "{role}guest" + "virtual_hosts": "", // optional, newline-joined hostnames + "created_by": "admin", // optional, default "admin" + "created_time": 1714694400000 // optional, defaults to current UTC epoch ms +} +``` + +Repeated CLI flags (for example multiple `--path` or `--config-parameter` arguments) are joined with newlines before being sent to the API, mirroring how the admin form serializes textarea fields. + +## Relationships +- **`labeltype`**: `--label-type-id` values must exist as Label Type IDs; create them first via `fessctl labeltype create` so search results can be filtered by label. +- **`fileauth`**: SMB/FTP/SFTP/S3/GCS sources that need credentials require a matching File Authentication record (`fessctl fileauth ...`) keyed by the same hostname/scheme. +- **`scheduler`**: Only configs with `available=true` are picked up by the default crawler job. Use `fessctl scheduler` to inspect or trigger that job. +- **`role` / `group` / `user`**: Strings in `--permission` (`{role}...`, `{group}...`, `{user}...`) must reference identities managed by the corresponding `fessctl` resource commands. +- **`webconfig` / `dataconfig`**: Sibling crawl-source resources; they share scheduler and label-type wiring but use different paths/clients. + +## Gotchas +- Path scheme matters: use `file:/absolute/path` for local filesystems, `smb://HOST/Share/` for Windows shares, `s3://bucket/`, or `gcs://bucket/`. Trailing slashes are recommended for directories so the include/exclude regex anchors behave predictably. +- Include/exclude expressions are Java-flavored regular expressions matched against the full path; remember to escape dots and to use `.*` (not glob `*`) for wildcards. +- Cloud-storage credentials live in `--config-parameter` as `key=value` lines (e.g. `client.accessKey=...`, `client.secretKey=...`, `client.region=...` for S3, or `client.projectId=...`, `client.credentialsFile=...` for GCS). Treat these like secrets and avoid committing them to YAML exports. +- `update` requires the existing config to be readable — if `get_fileconfig` fails the command exits before sending an update, so confirm the ID with `fessctl fileconfig list` first. +- `delete` is irreversible from the CLI; there is no soft-delete or trash recovery in Fess. +- `created_time` / `updated_time` are epoch milliseconds (UTC). The CLI defaults them to "now"; only override when reproducing historical state. +- Available since Fess 15.6 admin API; older versions may reject newer fields such as `gcs://` schemes or extra client parameters. + +## Examples + +```bash +# Minimal create: crawl a local share with defaults +fessctl fileconfig create \ + --name "Share Directory" \ + --path "file:/home/share" +``` + +```bash +# Typical update: tighten scope and lower concurrency on an existing config +fessctl fileconfig update FCONFIG_ID \ + --excluded-path ".*/\.git/.*" \ + --excluded-path ".*\.tmp$" \ + --num-of-thread 2 \ + --interval-time 5000 \ + --available true +``` + +```bash +# List and filter via JSON output +fessctl fileconfig list --size 200 --output json \ + | jq '.response.settings[] | select(.paths | startswith("smb://")) | {id, name, paths}' +``` + +## See also +- fess-docs: en/15.6/admin/fileconfig-guide.rst +- workflows.md: n/a +- Related features: `references/features/fileauth.md`, `references/features/labeltype.md`, `references/features/scheduler.md` diff --git a/skills/fessctl/references/features/scheduler.md b/skills/fessctl/references/features/scheduler.md new file mode 100644 index 0000000..8e640fb --- /dev/null +++ b/skills/fessctl/references/features/scheduler.md @@ -0,0 +1,109 @@ +# Scheduler / Jobs (fessctl `scheduler`) + +## What it is + +The Scheduler manages cron-like job definitions inside Fess. In the admin UI it lives at **System > Scheduler** (`/admin/scheduler`). Each scheduler row is a named job with a CRON expression, an executor (currently only `groovy`), and a script body. The script typically invokes the `crawlJob` component to start crawls against one or more `webconfig` / `fileconfig` / `dataconfig` IDs, but it can run any Groovy code the container exposes. + +A scheduler "job" is the *definition*; an actual *run* is what happens when the cron fires (or when the job is manually started). Each run, when `job_logging` is enabled, produces one row in the **joblog** index, and while a crawler-type job is alive it also populates **crawlinginfo** rows for progress and statistics. The `target` field acts as an identifier filter so jobs can be invoked selectively from batch commands; use `all` if you do not run jobs from the CLI batch entry point. + +The default Fess install ships a "Default Crawler" job which is what most operators trigger from the UI's **Start Now** button. + +## When to use + +- Schedule a recurring crawl (e.g. nightly at 02:00) against a fixed set of webconfig / fileconfig / dataconfig IDs. +- Trigger a one-off crawl ad-hoc without waiting for the cron tick (`scheduler start`). +- Disable a recurring job during maintenance windows without deleting it (`scheduler update --available false`). +- Stop an in-flight job that is misbehaving or consuming too many resources (`scheduler stop`). +- Inspect or export the script body of an existing job before editing it via `scheduler get -o yaml`. + +## Subcommand surface + +| Subcommand | Purpose | +|------------|---------| +| `create` | Create a new scheduler job (name, target, script_type, cron_expression, script_data, crawler, job_logging, available, sort_order, created_by, created_time). | +| `update` | Update an existing scheduler by ID. Only the flags you pass are overwritten; everything else is preserved by re-fetching the current setting. | +| `delete` | Delete a scheduler by ID. Removes the definition; does not abort an in-flight run. | +| `get` | Fetch a single scheduler by ID and render details (id, name, target, cron, script_type, script_data, crawler, job_logging, available, sort_order, audit fields). | +| `list` | List schedulers with `--page` / `--size` pagination. Default page size is 100. | +| `start` | Start a scheduler run immediately. Returns a `jobLogId` on success when logging is enabled. | +| `stop` | Stop a currently running scheduler. | + +Always reconfirm with `fessctl scheduler --help`. + +## Resource JSON shape + +```json +{ + "id": "abc123", + "name": "Nightly Web Crawl", + "target": "all", + "cron_expression": "0 0 2 * * ?", + "script_type": "groovy", + "script_data": "return container.getComponent(\"crawlJob\").logLevel(\"info\").webConfigIds([\"1\",\"2\"] as String[]).fileConfigIds([] as String[]).dataConfigIds([] as String[]).execute(executor);", + "job_logging": "true", + "crawler": "true", + "available": true, + "sort_order": 1, + "created_by": "admin", + "created_time": 1735689600000, + "updated_by": "admin", + "updated_time": 1735689600000, + "version_no": 1, + "crud_mode": 1 +} +``` + +Required on create: `name`, `target`, `script_type`. The other fields default to empty strings, `available=true`, and `sort_order=1` in the CLI. Note that `crawler` and `job_logging` are passed as strings (e.g. `"true"` / `"false"`) by the admin API, while `available` is a boolean. + +## Relationships + +- Scheduler jobs trigger crawls against **webconfig**, **fileconfig**, and **dataconfig** entries by referencing their IDs in the Groovy `script_data`. +- Each run, when `job_logging` is enabled, produces one **joblog** row (start/end time, status, log target). +- While a crawler-type job is running, it populates **crawlinginfo** rows with per-session progress, document counts, and parameters. +- Setting `available=false` (or deleting) prevents *future* runs from firing on the cron, but does **not** abort an in-flight run — use `scheduler stop` for that. +- The `target` value is also used by `job.max.crawler.processes` (in `fess_config.properties`) to cap concurrent crawler-typed jobs. + +## Gotchas + +- Cron syntax follows the **Quartz** format used by LastaJob. The admin UI hint shows 5 fields (`minute hour day month day-of-week`), but the underlying engine accepts the 6/7-field Quartz form (e.g. `0 0 2 * * ?`). Test before relying on it. +- `script_type` is effectively limited to `groovy` today; passing anything else will create the row but the executor will fail at runtime. +- Groovy scripts run inside the Fess JVM with admin privileges — treat `script_data` as trusted code and review carefully on `update`. +- The schedule fires in the **server's local timezone**, not UTC. Verify the OS / container timezone before choosing cron values. +- `available=false` and "deleted" look similar in effect (no future runs) but only `delete` removes the row; prefer `available=false` for temporary disables so you can re-enable without losing the script. +- `start` returns immediately with a `jobLogId`; the run continues asynchronously. Poll **joblog** to know when it finished. +- `stop` only signals the running job; long-running steps (e.g. a single slow HTTP fetch) may take time to actually halt. +- `update` first re-fetches the current setting and merges your flags on top, so omitting a flag leaves the previous value intact — this is the safe default but means you cannot clear a string field by passing an empty string through the CLI. + +## Examples + +```bash +# Minimal: create a default-style web crawler job that runs nightly at 02:00 +fessctl scheduler create \ + --name "Nightly Web Crawl" \ + --target "all" \ + --script-type "groovy" \ + --cron-expression "0 0 2 * * ?" \ + --script-data 'return container.getComponent("crawlJob").logLevel("info").webConfigIds(["1"] as String[]).fileConfigIds([] as String[]).dataConfigIds([] as String[]).execute(executor);' \ + --job-logging "true" \ + --crawler "true" +``` + +```bash +# Typical update: temporarily disable a job and shift its schedule to weekends only +fessctl scheduler update \ + --available false \ + --cron-expression "0 0 3 ? * SAT,SUN" +``` + +```bash +# Trigger a one-off run now, then list jobs to see status +fessctl scheduler start +fessctl scheduler list +# Inspect the resulting run via the joblog feature using the returned jobLogId +``` + +## See also + +- fess-docs: en/15.6/admin/scheduler-guide.rst +- workflows.md: Recipe 1 (initial setup), Recipe 3 (failing crawl investigation) +- Related features: `references/features/webconfig.md`, `references/features/fileconfig.md`, `references/features/dataconfig.md`, `references/features/joblog.md`, `references/features/crawlinginfo.md` diff --git a/skills/fessctl/references/features/user.md b/skills/fessctl/references/features/user.md new file mode 100644 index 0000000..279df2e --- /dev/null +++ b/skills/fessctl/references/features/user.md @@ -0,0 +1,104 @@ +# Users (fessctl `user`) + +## What it is + +The User feature manages the authenticated principals that can log in to Fess (both administrators and end users for permission-restricted search). It is reached through the admin UI menu path **User → User**, where you can list, create, edit, and delete user accounts. Each user has a name and password, and may be associated with one or more roles and/or groups. + +In Fess's authorization model, a *user* is the identity that authenticates, a *role* is a permission grant attached to crawl configurations and documents, and a *group* is an organizational bucket for collecting users. Roles drive access control on indexed content; groups are an additional grouping primitive that can also be used in role-based filtering. A user inherits the union of permissions from every role and group it belongs to. + +The `fessctl user` command exposes the same admin REST API used by the UI, so it is suitable for scripted provisioning, CI bootstrap, and bulk audits. + +## When to use + +- Provision a new administrator account during initial Fess bootstrap (a user attached to the `admin` role). +- Create a search-only viewer account so internal users can hit role-restricted documents. +- Rotate or reset the password of an existing account after a security event. +- Audit the user directory by dumping `list --output json` and filtering with `jq`. +- Re-assign a user from one team's group/role to another during a re-org. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments / options | +|------------|---------|-------------------------| +| `create` | Create a new user. | `name` (arg), `password` (arg), `--attribute/-a key=value` (repeatable), `--role/-r` (repeatable), `--group/-g` (repeatable), `--output/-o` | +| `update` | Update an existing user (fetched then merged). | `user_id` (arg), `--password/-p`, `--attribute/-a`, `--role/-r`, `--group/-g`, `--updated-by`, `--updated-time`, `--output/-o` | +| `delete` | Delete a user by ID. | `user_id` (arg), `--output/-o` | +| `getbyname` | Resolve a user by name (URL-safe base64 encodes the name and calls `get`). | `name` (arg), `--output/-o` | +| `get` | Retrieve a user's full record by ID. | `user_id` (arg), `--output/-o` | +| `list` | List users with pagination. | `--page/-p` (default 1), `--size/-s` (default 100), `--output/-o` | + +Always reconfirm with `fessctl user --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 2, + "id": "QWRtaW4", + "name": "alice", + "password": "PLAINTEXT_AT_SUBMIT", + "confirm_password": "PLAINTEXT_AT_SUBMIT", + "roles": ["admin", "search-user"], + "groups": ["engineering"], + "attributes": { + "mail": "alice@example.com", + "surname": "Doe", + "givenName": "Alice", + "employeeNumber": "E-1024", + "departmentNumber": "42", + "title": "Staff Engineer" + }, + "updated_by": "admin", + "updated_time": 1714694400000, + "version_no": 1 +} +``` + +Required on create: `name` and `password` (the CLI auto-fills `confirm_password` to match). `roles`, `groups`, and `attributes` are optional. Extended LDAP-style fields such as `mail`, `surname`, `givenName`, `employeeNumber`, etc. are passed through the `--attribute` map rather than as top-level flags. + +## Relationships + +- A user references **role** names (strings) via the `roles` array. Each name must already exist as a Fess role; see `references/features/role.md`. +- A user references **group** names (strings) via the `groups` array. Each name must already exist as a Fess group; see `references/features/group.md`. +- API tokens issued for this principal live in the access-token resource; see `references/features/accesstoken.md`. +- Deletion order: delete (or reassign) the **users** first, then the role or group they reference. Removing a role or group that is still listed inside an active user record can leave dangling permission strings. +- `roles` and `groups` are matched by **name string**, not by internal ID, so renaming a role/group requires updating every user that references it. + +## Gotchas + +- Passwords are submitted in plaintext over the admin API; always run against TLS-protected endpoints. Fess hashes them server-side per `fess_config.properties`. +- The built-in `admin` user is special: deleting it locks you out of the admin UI. Rotate its password with `update` instead of recreating it. +- Password rotation is only possible through `update` (`--password`); there is no dedicated `passwd` subcommand. The CLI rejects passwords longer than 100 characters. +- Deleting the user you are currently authenticated as immediately invalidates your session. +- Role and group references are plain strings: a typo silently grants no permissions rather than erroring. +- `update` performs read-modify-write: omitted flags fall back to the existing values, but explicitly passing `--role` or `--group` **replaces** the entire list, it does not append. +- `getbyname` works by URL-safe base64-encoding the name and reusing `get`; it will fail the same way `get` does if the resulting ID is unknown. +- `--updated-time` defaults to "now" in epoch milliseconds; only override it for deterministic reproductions or imports. + +## Examples + +```bash +# Minimal create: just username and password +fessctl user create alice 'S3cret!Pass' +``` + +```bash +# Typical update: rotate password and replace role/group assignment +fessctl user update QWxpY2U \ + --password 'N3wS3cret!' \ + --role admin \ + --role search-user \ + --group engineering +``` + +```bash +# Audit: list users as JSON and show only those with the admin role +fessctl user list --size 500 --output json \ + | jq '.response.settings[] | select(.roles | index("admin")) | {id, name, roles, groups}' +``` + +## See also + +- fess-docs: en/15.6/admin/user-guide.rst +- workflows.md: Recipe 4 (provisioning a user with admin permissions) +- Related features: `references/features/role.md`, `references/features/group.md`, `references/features/accesstoken.md` diff --git a/skills/fessctl/references/features/webconfig.md b/skills/fessctl/references/features/webconfig.md new file mode 100644 index 0000000..5005ae6 --- /dev/null +++ b/skills/fessctl/references/features/webconfig.md @@ -0,0 +1,103 @@ +# Web Crawl Configs (fessctl `webconfig`) + +## What it is + +A Web Crawl Config (Web Config) defines how the Fess crawler fetches a web site or web application: where to start crawling, which URLs to follow or index, how aggressively to fetch, and which permissions to attach to the resulting documents. In the admin UI it lives at **Crawler -> Web Config** (left sidebar: "Crawler > Web"). Each entry corresponds to one logical site (for example, a corporate portal, a wiki, or a public homepage). + +The `fessctl webconfig` subcommand is a thin CRUD wrapper over the Fess admin API for these entries. It lets you script initial crawl setup, update threading and interval, attach label types and web authentications, and roll out the same configuration across multiple environments. + +## When to use + +- Bootstrapping a new Fess instance with a fixed set of sites (Infra-as-Code style). +- Tuning crawl parameters (`--num-of-thread`, `--interval-time`, `--depth`) without clicking through the admin UI. +- Disabling a site temporarily for maintenance via `--available false` instead of deleting it. +- Diffing or backing up Web Config definitions by piping `list`/`get` output through `--output json | jq`. + +## Subcommand surface + +| Subcommand | Purpose | Notes | +|------------|---------|-------| +| `create` | Create a new Web Config. | Requires `--name` and at least one `--url`. Sensible defaults for threads, interval, excluded URLs, permissions. | +| `update` | Update an existing Web Config by ID. | Takes positional `CONFIG_ID`. Only fields explicitly passed are overwritten; the rest are preserved via a read-modify-write cycle. | +| `delete` | Delete a Web Config by ID. | Irreversible. Does not cascade-delete related Web Auth entries. | +| `get` | Retrieve one Web Config by ID. | Renders a Markdown detail view by default; switch with `--output json` or `--output yaml`. | +| `list` | List Web Configs. | Supports `--page` / `--size` pagination. Default page size is 100. | + +Always reconfirm with `fessctl webconfig --help`. + +## Resource JSON shape + +```json +{ + "name": "Fess", // required + "urls": "https://fess.codelibs.org/", // required, newline-joined when multiple + "user_agent": "Mozilla/5.0 (compatible; Fess/FessCTL; +http://fess.codelibs.org/bot.html)", // optional, default shown + "num_of_thread": 1, // optional, default 1 + "interval_time": 10000, // optional, default 10000 (ms) + "boost": 1.0, // optional, default 1.0 + "available": true, // optional, default true + "sort_order": 1, // optional, default 1 + "description": "", // optional + "label_type_ids": [], // optional, list of label type IDs + "included_urls": "", // optional, newline-joined regex list + "excluded_urls": "(?i).*(css|js|jpeg|jpg|gif|png|bmp|wmv|xml|ico|exe)", // optional, default shown + "included_doc_urls": "", // optional + "excluded_doc_urls": "", // optional + "config_parameter": "", // optional, key=value lines (e.g. client.robotsTxtEnabled=false) + "depth": 1, // optional, default 1 + "max_access_count": 1000000, // optional, default 1000000 + "permissions": "{role}guest", // optional, default "{role}guest" + "virtual_hosts": "", // optional + "created_by": "admin", // optional, default "admin" + "created_time": 1714723200000 // optional, defaults to now (epoch ms, UTC) +} +``` + +## Relationships + +- **Label Type** (`fessctl labeltype`): attach via `--label-type-id` to surface this site under a label filter in search results. +- **Web Authentication** (`fessctl webauth`): for protected sites (BASIC, DIGEST, NTLM, Form), create the Web Config first and then bind a Web Auth entry to it by name. +- **Scheduler** (`fessctl scheduler`): the default crawler job picks up entries with `available: true`. Disable a config instead of deleting it to skip a single run. +- **Role / Group / User** (`fessctl role`, `group`, `user`): values referenced by `--permission` (`{role}name`, `{group}name`, `{user}name`) must exist for permissions to take effect. +- Safe deletion order: remove dependent Web Auth entries first, then delete the Web Config, then drop unused Label Types. + +## Gotchas + +- All multi-value fields (`--url`, `--included-url`, `--excluded-url`, `--config-parameter`, `--permission`, `--virtual-host`) are newline-joined into a single string before being sent to the Fess API; pass the option multiple times instead of comma-separating. +- `update` performs a read-modify-write: if the config ID does not exist the command exits non-zero. Fields not specified retain their current value. +- The default `excluded_urls` regex blocks common static assets. If you actually need to crawl JS/CSS/images, override it explicitly with `--excluded-url ""` (or a narrower pattern). +- Included/Excluded URL patterns are Java regular expressions (per the admin guide), not glob patterns. +- `--available false` removes the entry from the next default crawler run but preserves already-indexed documents; `delete` only removes the configuration record, not the indexed data. +- The `permissions` value must exactly match the Fess permission syntax (`{role}guest`, `{user}alice`, `{group}dev`); free-form names are silently ignored at search time. +- All admin-API calls require an authenticated admin user as configured in `fessctl` settings. +- Virtual host values are matched against incoming search request hosts; see the Fess virtual host documentation before populating this field. + +## Examples + +```bash +# Minimal: crawl a public site under default settings +fessctl webconfig create \ + --name "Fess" \ + --url "https://fess.codelibs.org/" \ + --included-url "https://fess.codelibs.org/.*" +``` + +```bash +# Typical update: bump threads and interval for an existing config +fessctl webconfig update W1a2b3c4d5 \ + --num-of-thread 3 \ + --interval-time 5000 \ + --description "Tuned for nightly full crawl" +``` + +```bash +# List all configs and filter the disabled ones for review +fessctl webconfig list --size 200 --output json \ + | jq '.response.settings[] | select(.available == false) | {id, name, sort_order}' +``` + +## See also + +- fess-docs: en/15.6/admin/webconfig-guide.rst +- workflows.md: Recipe 1 (initial web crawl setup) +- Related features: `references/features/labeltype.md`, `references/features/webauth.md`, `references/features/scheduler.md` From 9fafa897d390f0ea22cc57b83498deba622dcc5a Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:40:01 +0900 Subject: [PATCH 09/16] feat(skill): add feature references batch 2 (role, group, accesstoken, webauth, fileauth) --- .../references/features/accesstoken.md | 94 +++++++++++++++ .../fessctl/references/features/fileauth.md | 96 +++++++++++++++ skills/fessctl/references/features/group.md | 93 +++++++++++++++ skills/fessctl/references/features/role.md | 92 +++++++++++++++ skills/fessctl/references/features/webauth.md | 109 ++++++++++++++++++ 5 files changed, 484 insertions(+) create mode 100644 skills/fessctl/references/features/accesstoken.md create mode 100644 skills/fessctl/references/features/fileauth.md create mode 100644 skills/fessctl/references/features/group.md create mode 100644 skills/fessctl/references/features/role.md create mode 100644 skills/fessctl/references/features/webauth.md diff --git a/skills/fessctl/references/features/accesstoken.md b/skills/fessctl/references/features/accesstoken.md new file mode 100644 index 0000000..a2f8f9f --- /dev/null +++ b/skills/fessctl/references/features/accesstoken.md @@ -0,0 +1,94 @@ +# Access Tokens (fessctl `accesstoken`) + +## What it is + +An access token is a credential issued by Fess that authorizes calls to the Fess admin and search APIs. In the admin UI it is managed under **System -> Access Token**, where each token carries a name, a permission expression, an optional request-parameter alias, and an optional expiration date. Permissions are written using the `{user|group|role}name` notation (for example `{group}developer`) and define the privilege the token confers on requests it accompanies. + +The `fessctl accesstoken` command group wraps the corresponding admin API endpoints so you can provision, inspect, modify, and revoke tokens from automation and scripts. This is useful for CI pipelines, scheduled audits, and bulk lifecycle management where the web UI is not practical. + +Note that fessctl itself authenticates every privileged call with an access token supplied via the `FESS_ACCESS_TOKEN` environment variable (see `references/authentication.md`). The token you create with `fessctl accesstoken create` is the same kind of token fessctl consumes, so this command effectively bootstraps and maintains the credentials for fessctl itself and for any other API client. + +## When to use + +- Provision a long-lived service-account token for a CI job that pushes crawl configs, schedulers, or dictionaries via fessctl. +- Rotate an existing token by creating a replacement, updating the consumer's environment, then deleting the old one. +- Audit issued tokens (names, permissions, expiration) by listing them in JSON for downstream review or compliance reporting. +- Issue a short-lived token with an `--expires` value for a one-off integration or contractor. + +## Subcommand surface + +| Subcommand | Required arguments | Key options | Purpose | +|------------|--------------------|-------------|---------| +| `create` | `--name` | `--token`, `--permission` (repeatable), `--parameter-name`, `--expires`, `--created-by`, `--created-time`, `--output/-o` | Create a new access token. | +| `update` | `accesstoken_id` (positional) | `--name`, `--token`, `--permission` (repeatable), `--parameter-name`, `--expires`, `--updated-by`, `--updated-time`, `--output/-o` | Update an existing token in place. | +| `delete` | `accesstoken_id` (positional) | `--output/-o` | Delete a token by ID. | +| `get` | `accesstoken_id` (positional) | `--output/-o` | Retrieve full details for a single token. | +| `list` | (none) | `--page/-p`, `--size/-s`, `--output/-o` | List tokens with pagination. | + +Always reconfirm with `fessctl accesstoken --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "name": "ci-pipeline", + "token": "optional-explicit-token-string", + "permissions": "{role}admin\n{group}developer", + "parameter_name": null, + "expires": "2026-12-31T23:59:59", + "created_by": "admin", + "created_time": 1746230400000, + "updated_by": "admin", + "updated_time": 1746230400000 +} +``` + +Required on create: `name` and at least one entry in `permissions` (passed as repeated `--permission` flags, joined with newlines by fessctl). `expires` is optional but recommended for non-service tokens; the CLI accepts `yyyy-MM-ddTHH:mm:ss`. `parameter_name` is optional and only meaningful for trusted internal embedding scenarios. `crud_mode` is set automatically (`1` for create, `2` for update). + +## Relationships + +- The token authorizes a fessctl session: the value is sent on every API call via `FESS_ACCESS_TOKEN`, so this resource is the gate for all other `fessctl` subcommands. +- Permission strings reference users, groups, and roles defined under `references/features/user.md` and `references/features/role.md` (and groups). Those entities must exist for the permission to resolve at request time. +- See `references/authentication.md` for how fessctl consumes the issued token. +- The `parameter_name` field interacts with the search API's permission-via-query-parameter feature; only enable it on trusted internal networks. + +## Gotchas + +- The full token value is shown only at creation time; if you discard it you must create a new token and delete the old one. +- Deleting the token currently used by your fessctl session (the value in `FESS_ACCESS_TOKEN`) immediately invalidates that session and logs you out of subsequent API calls. +- Permission strings are exact: use the documented `{user}name`, `{group}name`, `{role}name` form (for example `{role}admin-api`). Typos silently grant nothing. +- Treat tokens as secrets: never commit them to git, never echo them into shared logs, and store them in a secrets manager or CI secret store. +- `--permission` is repeatable; pass the flag once per permission entry rather than comma-joining values. +- `--expires` uses `yyyy-MM-ddTHH:mm:ss` (no timezone suffix); the server interprets it in its configured timezone. +- `update` performs a read-modify-write: fields you do not pass are preserved from the current record, but `--permission` replaces the entire permission list when supplied. + +## Examples + +```bash +# Minimal create: issue a token for an admin-API role with a hard expiration +fessctl accesstoken create \ + --name ci-pipeline \ + --permission '{role}admin-api' \ + --expires 2026-12-31T23:59:59 +``` + +```bash +# Update: rename and replace permissions on an existing token +fessctl accesstoken update \ + --name ci-pipeline-prod \ + --permission '{role}admin-api' \ + --permission '{group}release' +``` + +```bash +# List and filter via JSON output +fessctl accesstoken list --size 200 --output json \ + | jq '.response.settings[] | select(.expires == null or .expires < "2026-06-01") | {id, name, expires, permissions}' +``` + +## See also + +- fess-docs: en/15.6/admin/accesstoken-guide.rst +- workflows.md: n/a +- Related features: `references/features/user.md`, `references/features/role.md`, plus cross-link `references/authentication.md` diff --git a/skills/fessctl/references/features/fileauth.md b/skills/fessctl/references/features/fileauth.md new file mode 100644 index 0000000..764f3d3 --- /dev/null +++ b/skills/fessctl/references/features/fileauth.md @@ -0,0 +1,96 @@ +# File Authentication Credentials (fessctl `fileauth`) + +## What it is + +File Authentication credentials let the Fess crawler log in to network file sources that require authentication before content can be fetched. According to the Fess admin guide (`en/15.6/admin/fileauth-guide.rst`), Fess supports two authentication schemes for file crawling: FTP and SAMBA (Windows shared folders / SMB). Each entry stores a hostname, port, scheme, username, password, and optional parameters, and is bound to a specific File Crawl configuration. + +In the admin UI, these records are managed under **Crawler > File Authentication**. Creating a record is essentially populating the same form fields that the UI exposes (Hostname, Port, Scheme, Username, Password, Parameters, File System Config). The `fessctl fileauth` subcommands wrap the equivalent admin API endpoints so the same credentials can be created, listed, retrieved, updated, and deleted from automation. + +A File Authentication entry is meaningless on its own — it must reference an existing File Crawl configuration (`file_config_id`) for the crawler to consult it when visiting matching SMB/FTP URLs. + +## When to use + +- Crawling a Windows shared folder (`smb://server/share/...`) that requires a domain user instead of guest access. +- Crawling an internal FTP server where anonymous login is disabled. +- Rotating the password used by the crawler for an existing SMB or FTP source without recreating the File Crawl configuration. +- Auditing or exporting all file-credential entries (e.g. for review) via `list --output json`. + +## Subcommand surface + +| Subcommand | Purpose | Required arguments / options | +|------------|---------|------------------------------| +| `create` | Create a new FileAuth credential and bind it to a File Crawl config. | `--username`, `--file-config-id`; optional `--password`, `--hostname`, `--port`, `--protocol-scheme`, `--parameters`, `--created-by`, `--created-time`, `--output` | +| `update` | Update fields of an existing FileAuth (fetched first, then patched). | `config_id` arg; optional `--username`, `--password`, `--hostname`, `--port`, `--protocol-scheme`, `--parameters`, `--file-config-id`, `--updated-by`, `--updated-time`, `--output` | +| `delete` | Delete a FileAuth by ID. | `config_id` arg; optional `--output` | +| `get` | Retrieve a single FileAuth by ID with full detail rendering. | `config_id` arg; optional `--output` | +| `list` | List FileAuth records with paging. | Optional `--page/-p`, `--size/-s`, `--output/-o` | + +Always reconfirm with `fessctl fileauth --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "username": "crawler-user", + "password": "secret", + "hostname": "fileserver.example.com", + "port": 445, + "protocol_scheme": "smb", + "parameters": "domain=CORP", + "file_config_id": "FILE_CONFIG_ID_HERE", + "created_by": "admin", + "created_time": 1714694400000 +} +``` + +Required on create: `username`, `file_config_id`. `crud_mode` is set internally (`1` for create, `2` for update). On update, `updated_by` and `updated_time` replace the created counterparts and the existing record is fetched first so unspecified fields are preserved. + +## Relationships + +- Depends on `fileconfig`: a FileAuth must reference an existing File Crawl configuration via `--file-config-id`. Create the `fileconfig` first, then attach credentials. +- The crawler matches the auth entry by hostname/port/scheme when fetching URLs that belong to the bound `fileconfig`. +- Parallel to `webauth` (HTTP/Web crawler credentials) and `dataconfig` credentials, but only consulted by the file crawler. +- Deleting a `fileconfig` does not implicitly clean up its FileAuth rows; remove them explicitly via `fessctl fileauth delete`. + +## Gotchas + +- Secret hygiene: `--password` is passed on the command line and may be captured in shell history. Prefer setting it via a wrapper script that reads from a secret store, or update the value out-of-band in the UI. +- `--protocol-scheme` accepts values such as `smb` or `ftp` (the help text lists `file, smb` as examples). Match the scheme used by the bound `fileconfig` URLs; mismatches mean the crawler will never consult the credential. +- For SMB / Windows shares, the NTLM domain is supplied through `--parameters` using the form `domain=FUGA` (per the admin guide). It is not a separate field. +- `--port` is optional; omit it to let the crawler use the protocol default (e.g. 445 for SMB, 21 for FTP). If you do specify it, ensure it matches the share's actual listener. +- `update` performs a read-modify-write: if the record was changed concurrently in the UI, your update may overwrite those edits. The `version_no` returned by `get` reflects the optimistic-locking version maintained by Fess. +- `created_time` / `updated_time` default to "now" in milliseconds; only override them when migrating data. +- The list response key is `settings` (plural), and IDs are opaque strings — always copy them from `list` or `get` output rather than constructing them. + +## Examples + +```bash +# Minimal create: SMB credential bound to an existing FileConfig +fessctl fileauth create \ + --username crawler-user \ + --password 's3cret!' \ + --hostname fileserver.example.com \ + --port 445 \ + --protocol-scheme smb \ + --parameters 'domain=CORP' \ + --file-config-id FILE_CONFIG_ID_HERE +``` + +```bash +# Update only the password for an existing FileAuth +fessctl fileauth update FILEAUTH_ID_HERE \ + --password 'rotated-password' +``` + +```bash +# List all FileAuths and filter for SMB entries on a specific host +fessctl fileauth list --size 200 --output json \ + | jq '.response.settings[] | select(.protocol_scheme == "smb" and .hostname == "fileserver.example.com")' +``` + +## See also + +- fess-docs: en/15.6/admin/fileauth-guide.rst +- workflows.md: n/a +- Related features: `references/features/fileconfig.md` diff --git a/skills/fessctl/references/features/group.md b/skills/fessctl/references/features/group.md new file mode 100644 index 0000000..9f71658 --- /dev/null +++ b/skills/fessctl/references/features/group.md @@ -0,0 +1,93 @@ +# Groups (fessctl `group`) + +## What it is + +The Group feature manages the organizational buckets that users belong to in Fess. It is reached through the admin UI menu path **User -> Group**, where you can list, create, edit, and delete group definitions. A group has only two meaningful fields: a `name` and an optional free-form `attributes` map. Groups are commonly used when integrating with an external directory (LDAP/Active Directory) so that the directory's group membership can be mirrored inside Fess. + +In Fess's authorization model a *user* is the principal that authenticates, a *role* is a permission grant attached to crawl configurations and documents (issued via permission strings such as `{role}admin`), and a *group* is a separate organizational primitive that can also be referenced as a permission string (`{group}engineering`). Groups do not by themselves grant any specific capability inside the admin UI; they exist so that crawl configs, documents, and users can express access in terms of "anyone in group X." + +The `fessctl group` command exposes the same admin REST API that the UI uses, which makes it suitable for scripted provisioning, LDAP mirror jobs, and bulk audits. + +## When to use + +- Mirror an LDAP/AD organizational unit into Fess so that crawl-config permissions can reference it. +- Create department-style buckets (for example `engineering`, `sales`, `legal`) that several users will share. +- Audit which groups exist before assigning them to a user via `fessctl user create --group`. +- Clean up stale groups left over after a re-org, after first detaching them from any users that still reference them. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments / options | +|------------|---------|-------------------------| +| `create` | Create a new group. | `name` (arg, max 100 chars), `--attribute/-a key=value` (repeatable), `--output/-o` | +| `update` | Update an existing group (fetched then merged). | `group_id` (arg), `--attribute/-a key=value` (repeatable), `--updated-by`, `--updated-time`, `--output/-o` | +| `delete` | Delete a group by ID. | `group_id` (arg), `--output/-o` | +| `getbyname` | Resolve a group by name (URL-safe base64 encodes the name and calls `get`). | `name` (arg), `--output/-o` | +| `get` | Retrieve a group's full record by ID. | `group_id` (arg), `--output/-o` | +| `list` | List groups with pagination. | `--page/-p` (default 1), `--size/-s` (default 100), `--output/-o` | + +Always reconfirm with `fessctl group --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 2, + "id": "ZW5naW5lZXJpbmc", + "name": "engineering", + "attributes": { + "description": "Product engineering org", + "ou": "Engineering", + "owner": "alice" + }, + "updated_by": "admin", + "updated_time": 1714694400000, + "version_no": 1 +} +``` + +Required on create: `name` only. `attributes` is an optional free-form `key=value` map (commonly used to carry LDAP fields such as `description`, `ou`, or `owner`); it has no schema enforcement on the server. `id` is server-assigned and is the URL-safe base64 of the name. `updated_by`, `updated_time`, and `version_no` are managed by the API and only need to be supplied on `update`. + +## Relationships + +- A **user** references group names (strings) via its `groups` array; see `references/features/user.md`. +- A **role** is an independent permission grant; groups and roles are parallel, not hierarchical. See `references/features/role.md`. +- A group may be referenced by **permission strings** of the form `{group}` inside the `permissions` field of webconfig, fileconfig, and dataconfig records, and inside the per-document `permissions` indexed by the crawler. The match is by name string, not by ID. +- The full effective permission set of a logged-in user is the union of all `{role}...` and `{group}...` entries derived from their roles and groups. + +## Gotchas + +- Group names are limited to 100 characters and are matched by **name string** everywhere they appear (user records, crawl-config permission lists, document permissions). Renaming a group therefore requires rewriting every reference; the admin API does not cascade renames. +- Deletion order matters: detach the group from every user (`fessctl user update ... --group ...`) and remove `{group}name` permission strings from any webconfig/fileconfig/dataconfig before deleting the group itself, otherwise you will leave dangling permission strings that grant no access but still appear in records. +- `update` performs a read-modify-write: passing `--attribute` **replaces** the entire attribute map; omit the flag to keep the existing attributes untouched. There is no per-key delete flag. +- `getbyname` works by URL-safe base64-encoding the name and delegating to `get`; if the encoded ID is unknown the call returns the same not-found error as `get`. +- The admin UI exposes only `name`; `attributes` are settable via the API/CLI and are useful when mirroring LDAP, but they are not displayed on the standard UI form in 15.6. +- `--updated-time` defaults to "now" in epoch milliseconds and is only worth overriding for deterministic imports or replays. +- Deleting a group never deletes the users that referenced it; it only orphans the reference. + +## Examples + +```bash +# Minimal create: just a group name +fessctl group create engineering +``` + +```bash +# Typical update: replace the attribute map on an existing group +fessctl group update ZW5naW5lZXJpbmc \ + --attribute description='Product engineering org' \ + --attribute ou=Engineering \ + --attribute owner=alice +``` + +```bash +# List all groups as JSON and filter for those tagged with an LDAP ou attribute +fessctl group list --size 500 --output json \ + | jq '.response.settings[] | select(.attributes.ou != null) | {id, name, ou: .attributes.ou}' +``` + +## See also + +- fess-docs: en/15.6/admin/group-guide.rst +- workflows.md: Recipe 4 +- Related features: `references/features/user.md`, `references/features/role.md` diff --git a/skills/fessctl/references/features/role.md b/skills/fessctl/references/features/role.md new file mode 100644 index 0000000..27bb1e6 --- /dev/null +++ b/skills/fessctl/references/features/role.md @@ -0,0 +1,92 @@ +# Roles (fessctl `role`) + +## What it is + +Roles let you group users for the purpose of access control inside Fess. A role is a thin object — essentially just a name plus optional attributes — but it becomes meaningful when referenced from users, groups, and crawl configurations (web/file/data). Roles are especially useful when integrating with LDAP or other external identity systems, where role names typically map to directory groups. + +In the admin UI, roles are managed under **User > Role** (left sidebar). The list page shows all roles; clicking a role name opens its edit screen, and a Delete button is available there as well. + +`fessctl role` exposes the same CRUD surface over the admin REST API, so role provisioning can be scripted alongside user/group setup. + +## When to use + +- Preparing access-controlled search by creating roles such as `admin`, `sales`, or `engineering` before assigning them to users. +- Mirroring LDAP/AD groups into Fess so that crawled documents can be tagged with the same role names. +- Restricting a `webconfig`/`fileconfig`/`dataconfig` to a subset of users by attaching roles via the permissions field (`{role}name`). +- Auditing or exporting the current role list as JSON/YAML for backup or diffing across environments. + +## Subcommand surface + +| Subcommand | Purpose | +|---|---| +| `create` | Create a new role with a name (max 100 chars) and optional `--attribute key=value` pairs. | +| `update` | Update an existing role by ID. Re-fetches current state, then applies new attributes / `--updated-by` / `--updated-time`. | +| `delete` | Delete a role by ID. | +| `getbyname` | Convenience: look up a role by its name (URL-safe base64 encodes the name and delegates to `get`). | +| `get` | Retrieve a single role by ID and render details. | +| `list` | List roles with `--page` / `--size` pagination. | + +Always reconfirm with `fessctl role --help`. + +## Resource JSON shape + +```json +{ + "name": "sales", // required, max 100 chars + "attributes": { // optional, default {} — free-form key/value map + "ldap_dn": "cn=sales,ou=groups,dc=example,dc=com" + }, + "id": "AY...", // optional, default server-assigned on create; required for update/delete + "version_no": 1, // optional, default managed by server (optimistic locking) + "crud_mode": 2, // optional, default set by fessctl on update (2 = edit) + "updated_by": "admin", // optional, default "admin" on update + "updated_time": 1735689600000 // optional, default current UTC epoch millis on update +} +``` + +## Relationships + +- Referenced by `user` resources via the `roles` array — a user inherits all permissions implied by their roles. +- Referenced by `group` resources via the `roles` array — group membership can carry roles transitively. +- Referenced by `webconfig`, `fileconfig`, and `dataconfig` permissions strings using the `{role}name` syntax (e.g., `{role}admin`). Documents crawled under such a config become searchable only by users who hold that role. +- Role names also appear in search-time access control: query-time role filtering uses the same names. + +## Gotchas + +- **Name length**: limited to 100 characters; longer names are rejected by the API. +- **Naming conventions**: stick to lowercase alphanumerics and avoid spaces/special characters — role names appear verbatim inside permission strings like `{role}sales` and in URL-safe base64 lookups (`getbyname`). +- **Deletion order**: before deleting a role, remove or update any `user`/`group` that lists it in `roles`, and any `webconfig`/`fileconfig`/`dataconfig` whose `permissions` references `{role}`. Otherwise you will leave dangling references that silently exclude documents from search results. +- **Optimistic locking**: `version_no` is managed by the server. `update` re-fetches the role first to obtain the current version; do not hand-edit it. +- **`getbyname`**: encodes the name with URL-safe base64 before calling the `get` endpoint — confirm the role actually exists with `list` if `getbyname` returns not-found. +- **Built-in `admin` role**: the bundled administrator account ships with an `admin` role; renaming or deleting it can lock you out of the admin UI. +- **Attribute schema**: `attributes` is free-form. Keys are not validated by the server, so typos silently succeed. + +## Examples + +Minimal create: + +```bash +fessctl role create sales +``` + +Typical update (replace attributes on an existing role): + +```bash +fessctl role update AYxxxxxxxxxxxxxxxxxx \ + --attribute ldap_dn='cn=sales,ou=groups,dc=example,dc=com' \ + --attribute description='Sales team' \ + --updated-by admin +``` + +List and filter via JSON output: + +```bash +fessctl role list --size 200 --output json \ + | jq -r '.response.settings[] | select(.name | startswith("sales")) | "\(.id)\t\(.name)"' +``` + +## See also + +- fess-docs: en/15.6/admin/role-guide.rst +- workflows.md: Recipe 4 (provisioning a user with admin permissions) +- Related features: `references/features/user.md`, `references/features/group.md` diff --git a/skills/fessctl/references/features/webauth.md b/skills/fessctl/references/features/webauth.md new file mode 100644 index 0000000..32f12fd --- /dev/null +++ b/skills/fessctl/references/features/webauth.md @@ -0,0 +1,109 @@ +# Web Authentication Credentials (fessctl `webauth`) + +## What it is + +Web Authentication entries store the credentials Fess presents to protected web sites during a crawl. Each row binds a username/password (and optional realm, port, scheme parameters) to a specific Web Crawl Config so that the crawler can fetch pages that require login. Without a matching `webauth` row, requests to protected URLs return 401/403 and the crawler skips them. + +In the admin UI this corresponds to **Crawler -> Web Authentication** (left menu). Records are managed independently of the web config rows but cannot exist without one — the relationship is enforced by `web_config_id`. + +Fess supports four authentication schemes for web crawling: + +- **BASIC** — RFC 7617 HTTP Basic auth. +- **DIGEST** — RFC 7616 HTTP Digest auth. +- **NTLM** — NT LAN Manager auth (Windows / corporate intranets); requires `workstation` and `domain` via the Parameters field. +- **FORM** — HTML form-based login (handled via the `form_scheme` config on the related web config; the credentials live here). + +## When to use + +- Crawling a private corporate portal (SharePoint, intranet wiki, JIRA) that sits behind BASIC or NTLM auth. +- Ingesting pages from a legacy web app that prompts via DIGEST. +- Crawling a SaaS or CMS site whose content is only visible after a FORM login. +- Rotating a service-account password without touching the underlying web config. + +## Subcommand surface + +| Command | Decorator | Purpose | Key inputs | +|----------|---------------------------------|-------------------------------------------------|----------------------------------------------------------------------------| +| `create` | `@webauth_app.command("create")` | Register a new credential row | `--username`, `--web-config-id` (required); `--password`, `--hostname`, `--port`, `--auth-realm`, `--protocol-scheme`, `--parameters`, `--created-by`, `--created-time`, `--output/-o` | +| `update` | `@webauth_app.command("update")` | Modify an existing credential by ID | `config_id` (positional); same optional fields as create plus `--web-config-id`, `--updated-by`, `--updated-time`, `--output/-o` | +| `delete` | `@webauth_app.command("delete")` | Remove a credential by ID | `config_id` (positional); `--output/-o` | +| `get` | `@webauth_app.command("get")` | Fetch a single credential row | `config_id` (positional); `--output/-o` | +| `list` | `@webauth_app.command("list")` | List credentials (paged) | `--page/-p` (default 1), `--size/-s` (default 100), `--output/-o` | + +Always reconfirm with `fessctl webauth --help`. + +## Resource JSON shape + +```json +{ + "id": "abc123", + "crud_mode": 1, + "hostname": "intranet.example.com", + "port": 443, + "auth_realm": "Corporate", + "protocol_scheme": "BASIC", + "username": "svc-crawler", + "password": "REDACTED", + "parameters": "workstation=WS01\ndomain=CORP", + "web_config_id": "WCID-XYZ", + "created_by": "admin", + "created_time": 1714723200000, + "updated_by": "admin", + "updated_time": 1714723200000, + "version_no": 1 +} +``` + +Required on create: `username`, `web_config_id`. All other fields are optional but at least `password` and `protocol_scheme` are needed for a working credential. `port = -1` means "any port". + +## Relationships + +- Depends on `webconfig`: every `webauth` row binds to a single Web Crawl Config via `web_config_id`. Create the webconfig first, then attach credentials. +- Multiple `webauth` rows may share a single `web_config_id` (e.g., one per hostname or realm). +- For FORM authentication, the form-submission scheme itself is configured on the related webconfig; only the credentials live in `webauth`. +- NTLM relies on the `parameters` field for `workstation` / `domain` — these are part of the auth contract, not the webconfig. + +## Gotchas + +- **Secret hygiene**: passing `--password` puts the secret in shell history and the process list. Prefer reading from a file or env var (e.g., `--password "$WEBAUTH_PW"` after `read -rs WEBAUTH_PW`) and clearing the variable afterwards. +- **Scheme casing**: Fess expects upper-case scheme tokens (`BASIC`, `DIGEST`, `NTLM`, `FORM`). Lower-case values may be accepted but are not idiomatic. +- **NTLM domain syntax**: use newline-separated `workstation=...` / `domain=...` inside `--parameters`. Quote the whole value in the shell, e.g. `--parameters $'workstation=WS01\ndomain=CORP'`. +- **Port semantics**: `--port -1` means "match any port"; omitting `--port` does the same. Do not set `0`. +- **Hostname omitted**: if `--hostname` is absent, the credential applies to every host reachable via the bound webconfig — handy but easy to over-share. +- **Deletion does NOT cascade**: removing a `webauth` row leaves its `web_config_id` intact. Conversely, deleting the webconfig does not remove orphan webauth rows; clean them up explicitly. +- **Update preserves unspecified fields**: `update` first re-fetches the existing row, so omitted flags keep their stored values. Use this to rotate just the password without resending hostname/port. + +## Examples + +Minimal create (BASIC auth bound to an existing webconfig): + +```bash +fessctl webauth create \ + --username svc-crawler \ + --password 'S3cretPa$$' \ + --web-config-id WCID-XYZ \ + --hostname intranet.example.com \ + --port 443 \ + --protocol-scheme BASIC +``` + +Update — rotate the password without touching anything else: + +```bash +read -rs NEW_PW +fessctl webauth update --password "$NEW_PW" --updated-by admin +unset NEW_PW +``` + +List and filter via JSON output (find all credentials bound to one webconfig): + +```bash +fessctl webauth list --output json \ + | jq '.response.settings[] | select(.web_config_id == "WCID-XYZ") | {id, username, hostname, port}' +``` + +## See also + +- fess-docs: en/15.6/admin/webauth-guide.rst +- workflows.md: Recipe 1 +- Related features: `references/features/webconfig.md` From a8f835412b478c0e66e120d6a23fe7e345f8d538 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:40:01 +0900 Subject: [PATCH 10/16] feat(skill): add feature references batch 3 (labeltype, keymatch, boostdoc, elevateword, badword) --- skills/fessctl/references/features/badword.md | 92 +++++++++++++++ .../fessctl/references/features/boostdoc.md | 96 ++++++++++++++++ .../references/features/elevateword.md | 106 ++++++++++++++++++ .../fessctl/references/features/keymatch.md | 100 +++++++++++++++++ .../fessctl/references/features/labeltype.md | 87 ++++++++++++++ 5 files changed, 481 insertions(+) create mode 100644 skills/fessctl/references/features/badword.md create mode 100644 skills/fessctl/references/features/boostdoc.md create mode 100644 skills/fessctl/references/features/elevateword.md create mode 100644 skills/fessctl/references/features/keymatch.md create mode 100644 skills/fessctl/references/features/labeltype.md diff --git a/skills/fessctl/references/features/badword.md b/skills/fessctl/references/features/badword.md new file mode 100644 index 0000000..8b64f91 --- /dev/null +++ b/skills/fessctl/references/features/badword.md @@ -0,0 +1,92 @@ +# Bad Word (fessctl `badword`) + +## What it is + +Bad words are terms the administrator wants to suppress from the **Suggest** feature in Fess. Once registered, the term is filtered out of suggestion candidates the next time the suggest index is rebuilt or reloaded, so end users never see the term as a typeahead/autocomplete entry. + +The Fess admin UI exposes this under **Suggest > Bad Word** (left sidebar). Each entry holds a single `suggest_word` value and standard audit columns (`created_by`, `created_time`, `updated_by`, `updated_time`, `version_no`). The CSV download/upload screens accept a single-column file containing one word per line. + +Note: the official documentation (`badword-guide.rst`) describes the scope strictly as "exclude it from Suggest." Bad words do **not** rewrite or block primary search results; for that, use the role/permission system or content filtering. Verify the runtime behavior of your specific Fess version before promising end users that bad words will also disappear from full-text search hits. + +## When to use + +- Suppress profanity or slurs from suggest output on a public-facing search portal. +- Hide internal codenames, project aliases, or unreleased product names that leaked into the suggest index from crawled documents. +- Remove personally identifying tokens (phone numbers, employee IDs) that the suggest collector picked up from logs. +- Quickly silence a trending but undesirable query term reported by support, without retraining or recrawling. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments / options | +|---|---|---| +| `create` | Register a new bad word. | `--suggest-word` (required, no whitespace), `--created-by`, `--created-time`, `--output/-o` | +| `update` | Modify an existing bad word entry. | `config_id` (positional), `--suggest-word`, `--updated-by`, `--updated-time`, `--output/-o` | +| `delete` | Remove a bad word by ID. | `config_id` (positional), `--output/-o` | +| `get` | Show one bad word entry. | `config_id` (positional), `--output/-o` | +| `list` | Page through registered bad words. | `--page/-p` (default 1), `--size/-s` (default 100), `--output/-o` | + +Always reconfirm with `fessctl badword --help`. + +## Resource JSON shape + +```json +{ + "id": "AYabc123XyzExample", + "crud_mode": 1, + "suggest_word": "example-bad-term", + "created_by": "admin", + "created_time": 1714694400000, + "updated_by": "admin", + "updated_time": 1714694400000, + "version_no": 1 +} +``` + +Required on create: `suggest_word` (string, no whitespace). `crud_mode` is set internally by fessctl (`1` for create, `2` for update). Timestamps are epoch milliseconds in UTC. `id` and `version_no` are managed by Fess. + +## Relationships + +- **Opposite of `elevateword`**: elevate words *boost* terms in the suggest index, while bad words *suppress* them. +- Depends on the **Suggest** subsystem being enabled (`suggest.search.log` / `suggest.documents` collectors and a populated suggest index). +- Effective only after the suggest index is rebuilt or reloaded — typically via the `SuggestIndexer` scheduler job (see `scheduler.py`) or a manual rebuild from **Suggest > Maintenance**. +- Audit fields (`created_by`, `updated_by`) are free-form strings; pair with `user`/`role` features only for traceability, not enforcement. +- Does not interact with `boostdoc`, `keymatch`, or `dict/synonym`; those operate on the search index, not the suggest index. + +## Gotchas + +- **Suggest reload required.** Adding or removing a bad word does not retroactively edit the live suggest index; trigger a suggest rebuild before verifying. +- **Whitespace forbidden.** The `--suggest-word` help text states "no whitespace allowed." Use a single token; multi-word phrases will be rejected or behave unexpectedly. +- **Casing and normalization.** Suggest matching applies the analyzer chain configured for the suggest index (lowercasing, ICU folding, etc.). Register the word in the form the analyzer produces, otherwise it may not match candidates. +- **Exact vs prefix.** Bad-word filtering matches the normalized token, not arbitrary substrings. A registered word will not suppress longer compound suggestions that merely contain it as a substring. +- **Search results unaffected.** End users can still type the word and retrieve full-text matches; this feature only hides typeahead suggestions. +- **CSV upload/download is UI-only.** fessctl currently does not implement upload/download (see `# TODO` markers at the bottom of `badword.py`). Use the admin UI for bulk imports. +- **Timestamp options auto-fill to "now."** Override `--created-time` / `--updated-time` only when reproducing an exact audit trail; otherwise let fessctl populate them. + +## Examples + +Minimal create: + +```bash +fessctl badword create --suggest-word "exampleterm" +``` + +Update an existing entry by ID: + +```bash +fessctl badword update AYabc123XyzExample \ + --suggest-word "renamedterm" \ + --updated-by "ops-bot" +``` + +List and filter via JSON output piped through `jq` (e.g., extract just the IDs and words): + +```bash +fessctl badword list --output json --size 200 \ + | jq -r '.response.settings[] | [.id, .suggest_word] | @tsv' +``` + +## See also + +- fess-docs: en/15.6/admin/badword-guide.rst +- workflows.md: Recipe 5 +- Related features: `references/features/elevateword.md` diff --git a/skills/fessctl/references/features/boostdoc.md b/skills/fessctl/references/features/boostdoc.md new file mode 100644 index 0000000..4e1f927 --- /dev/null +++ b/skills/fessctl/references/features/boostdoc.md @@ -0,0 +1,96 @@ +# Boost Document (fessctl `boostdoc`) + +## What it is + +Document Boost lets you reorder search results so that documents matching a URL condition consistently rank higher (or lower) regardless of the search keywords. Each rule pairs a Groovy condition over the document URL with a Groovy boost-value expression; when a crawled document matches the condition, its index-time boost factor is multiplied by the expression's result. This means rules are evaluated **at index time**, not query time — re-crawling (or re-indexing) is required for changes to take effect on existing documents. + +In the admin UI, rules are managed under **Crawler > Document Boost** (left sidebar). The list page shows each configured rule, and clicking the URL expression opens its edit screen where the boost value and sort order can be tuned. + +`fessctl boostdoc` exposes the same CRUD surface over the Fess admin REST API, so boost rules can be provisioned alongside crawl configs and scheduler jobs in a scripted workflow. + +## When to use + +- Promote canonical product or pricing pages above blog mentions of the same keywords. +- Pin a "what's new" or release-notes URL pattern to the top during a launch window. +- Demote (boost < 1.0) archived or legacy URL prefixes that should still be indexed but rarely surfaced. +- Apply a tenant-wide weighting (e.g., docs from `support.example.com` slightly outranking forum posts). + +## Subcommand surface + +| Subcommand | Purpose | +|---|---| +| `create` | Create a new rule. Requires `--url-expr`, `--boost-expr`, `--sort-order`; optional `--created-by`, `--created-time`. | +| `update` | Update an existing rule by ID. Re-fetches current state, then applies any of `--url-expr`, `--boost-expr`, `--sort-order`, `--updated-by`, `--updated-time`. | +| `delete` | Delete a rule by ID. | +| `get` | Retrieve a single rule by ID and render details. | +| `list` | List rules with `--page` / `--size` pagination. | + +Always reconfirm with `fessctl boostdoc --help`. + +## Resource JSON shape + +```json +{ + "url_expr": "url.matches(\"https://www.example.com/products/.*\")", // required, Groovy boolean expression + "boost_expr": "2.0", // required, Groovy expression evaluating to a number + "sort_order": 1, // required, non-negative integer; controls evaluation order in the admin list + "id": "AY...", // server-assigned on create; required for update/delete + "version_no": 1, // managed by server (optimistic locking) + "crud_mode": 1, // 1 = create, 2 = edit; set automatically by fessctl + "created_by": "admin", // defaults to "admin" on create + "created_time": 1735689600000, // UTC epoch millis; defaults to now on create + "updated_by": "admin", // defaults to "admin" on update + "updated_time": 1735689600000 // UTC epoch millis; defaults to now on update +} +``` + +## Relationships + +- Applied **at index time** by the crawler/indexer pipeline — the resulting boost is baked into each document's stored boost factor, so existing documents are not retroactively re-scored when a rule changes. +- Depends on documents being indexed via `webconfig`, `fileconfig`, or `dataconfig`. A re-crawl (or full re-index) of affected URLs is required for new/changed rules to take effect. +- Complements `keymatch` (keyword-conditional pinning) and `elevateword` (suggest-time elevation), which both act at query time rather than index time. +- Sort order influences which rule is evaluated first when multiple `url_expr` patterns match the same URL. + +## Gotchas + +- **Groovy syntax**: both `url_expr` and `boost_expr` are Groovy. Inside `url.matches("...")` the regex must be a full match, and backslashes need escaping in JSON payloads (`\\.` → `\\\\.`). +- **Re-crawl required**: because boosting happens at index time, editing a rule does **not** rescore already-indexed documents. Re-crawl the matched URLs (or run a full re-index) for the change to be visible. +- **Boost magnitude**: very large values (e.g., `1000.0`) can crowd out keyword relevance entirely. Start small (`1.5`–`3.0`) and tune. +- **Demotion**: use a fractional value (e.g., `0.1`) in `boost_expr` to push results down rather than promoting them. +- **Sort order conflicts**: when multiple rules match, the lower `sort_order` is applied first. Decide on a numbering scheme (e.g., increments of 10) up front to make later inserts easy. +- **Interaction with keymatch / elevateword**: those features run at query time and stack on top of any index-time boost. A document that is boosted here can still be re-pinned by a matching `keymatch` rule. +- **Optimistic locking**: `version_no` is server-managed. `update` re-fetches the rule first; do not hand-edit it. + +## Examples + +Minimal create — promote all product pages with a 2x boost: + +```bash +fessctl boostdoc create \ + --url-expr 'url.matches("https://www.example.com/products/.*")' \ + --boost-expr '2.0' \ + --sort-order 10 +``` + +Update an existing rule to demote an archive prefix: + +```bash +fessctl boostdoc update AYxxxxxxxxxxxxxxxxxx \ + --url-expr 'url.matches("https://www.example.com/archive/.*")' \ + --boost-expr '0.2' \ + --sort-order 90 \ + --updated-by admin +``` + +List and filter via JSON output (find rules with non-default boost): + +```bash +fessctl boostdoc list --size 200 --output json \ + | jq -r '.response.settings[] | select(.boost_expr != "1.0") | "\(.sort_order)\t\(.id)\t\(.url_expr)\t\(.boost_expr)"' +``` + +## See also + +- fess-docs: en/15.6/admin/boostdoc-guide.rst +- workflows.md: Recipe 5 +- Related features: `references/features/keymatch.md`, `references/features/elevateword.md` diff --git a/skills/fessctl/references/features/elevateword.md b/skills/fessctl/references/features/elevateword.md new file mode 100644 index 0000000..6f06267 --- /dev/null +++ b/skills/fessctl/references/features/elevateword.md @@ -0,0 +1,106 @@ +# Elevate Word (fessctl `elevateword`) + +## What it is + +Elevate Words (also called "Additional Words") are operator-curated entries that are pushed into the Fess suggest dictionary so that they appear at—or near—the top of keyword autocomplete results. Normally, suggest entries are derived automatically from search queries and indexed content; elevate words let administrators inject preferred terms that may not yet have enough query/content signal to surface organically. + +In the Fess Admin UI, manage these entries from **Suggest > Additional Word** (`/admin/elevateword/`). Each entry pairs a `suggest_word` with a numeric `boost`; higher boost values rank the term higher in the suggest dropdown. Entries can optionally carry a `reading` (for kana / phonetic matching), and can be scoped by labels (`target_label`, `label_type_ids`) and roles (via `permissions`). + +The `fessctl elevateword` subcommand group is a thin wrapper around the admin REST API (`/api/admin/elevateword`). It supports CRUD operations and is suitable for scripted, declarative management of the elevate-word dictionary across environments. + +## When to use + +- Promote a brand or product name in autocomplete the moment it ships, before organic query signal exists (for example, a freshly launched SKU code). +- Ensure event names, campaign keywords, or seasonal terms surface at the top of suggestions during a marketing window. +- Guarantee that internal jargon, acronyms, or tenant-specific terminology appears for users behind a particular role or label scope. +- Bulk-seed a curated suggest dictionary as part of an environment provisioning pipeline (dev / staging / prod parity). + +## Subcommand surface + +| Subcommand | Purpose | Key required inputs | +|------------|---------|---------------------| +| `create` | Create a new elevate word entry. | `--suggest-word`, `--boost`, `--version-no` | +| `update` | Update an existing entry by ID; only provided fields are overwritten. | positional `config_id` (other flags optional) | +| `delete` | Delete an entry by ID. | positional `config_id` | +| `get` | Retrieve a single entry by ID. | positional `config_id` | +| `list` | List entries with paging. | `--page`, `--size` (both optional, defaults `1` / `100`) | + +All commands accept `--output / -o` with `text` (default), `json`, or `yaml`. Always reconfirm with `fessctl elevateword --help`. + +## Resource JSON shape + +The payload sent to the admin API (and returned in `response.setting`) follows this shape. `suggest_word` and `boost` are required; the rest are optional or system-managed. + +```json +{ + "id": "abc123", + "crud_mode": 1, + "suggest_word": "FessSearch", + "reading": "fessusaachi", + "boost": 10.0, + "target_label": "product", + "label_type_ids": ["lbl-001", "lbl-002"], + "permissions": "{role}admin\n{role}guest", + "version_no": 1, + "created_by": "admin", + "created_time": 1714694400000, + "updated_by": "admin", + "updated_time": 1714694400000 +} +``` + +Notes verified against `commands/elevateword.py`: + +- `permissions` is sent as a newline-joined string (the CLI joins repeated `--permission` flags with `\n`). +- `label_type_ids` is a list, supplied via repeated `--label-type-id` flags. +- `created_time` / `updated_time` are epoch milliseconds (UTC); the CLI defaults to "now" if omitted. + +## Relationships + +- Entries are populated into the **suggest index** used by the suggest API and the autocomplete dropdown in the search UI. +- Depends on the **suggest feature being enabled** in `fess_config.properties` (`suggest.search.log`, `suggest.document.contents`, etc.). +- After create / update / delete, a **suggest reload / rebuild** is required for the new word to appear in autocomplete (verify via the Suggest admin page or the relevant scheduler job). +- Label scoping interacts with `labeltype` configurations; role scoping interacts with `role` and `group` definitions. +- Related curation features: **Bad Word** (suppresses suggestions) and **Key Match** (boosts whole search results, not just suggestions). + +## Gotchas + +- **Suggest reload required.** Creating an elevate word does not immediately rewrite the live suggest index; trigger a suggest update (admin UI button or the suggest indexer job) before validating. +- **`reading` vs surface form.** `suggest_word` is what users see; `reading` is used to match typed input (e.g., kana for Japanese). Omitting `reading` for non-ASCII terms can hurt matchability. +- **Label / role scoping is restrictive.** If `target_label` or `permissions` are set, the word only surfaces for users whose context matches; an empty scope means "visible to all". +- **Casing and tokenization** depend on the suggest analyzer; the literal `suggest_word` may be lowercased or otherwise normalized at index time. +- **`version_no` is required on create** by this CLI even though it is conceptually optimistic-locking metadata; pass `1` for a brand-new entry. +- **`update` is a merge.** The CLI fetches the current setting first, then overlays only the flags you provided, so you do not need to resend unchanged fields. +- **Permissions format.** Use the Fess permission syntax (e.g., `{role}admin`, `{user}alice`); repeat the `--permission` flag per entry. + +## Examples + +Minimal create (promote `FessSearch` with boost `10.0`): + +```bash +fessctl elevateword create \ + --suggest-word "FessSearch" \ + --boost 10.0 \ + --version-no 1 +``` + +Update an existing entry to raise its boost and add a reading: + +```bash +fessctl elevateword update abc123 \ + --boost 25.0 \ + --reading "fessusaachi" +``` + +List all elevate words and filter via `jq` (machine-readable output): + +```bash +fessctl elevateword list --output json \ + | jq '.response.settings[] | select(.boost >= 10) | {id, suggest_word, boost}' +``` + +## See also + +- fess-docs: en/15.6/admin/elevateword-guide.rst +- workflows.md: Recipe 5 +- Related features: `references/features/badword.md`, `references/features/keymatch.md` diff --git a/skills/fessctl/references/features/keymatch.md b/skills/fessctl/references/features/keymatch.md new file mode 100644 index 0000000..22383df --- /dev/null +++ b/skills/fessctl/references/features/keymatch.md @@ -0,0 +1,100 @@ +# Key Match (fessctl `keymatch`) + +## What it is + +Key Match is a Fess relevance-tuning feature that pins specific documents to the top of the search results whenever a user issues a particular search term. The administrator registers a search term, an internal query that selects the documents to promote, a maximum number of documents to elevate, and a boost value used to weight them above the natural ranking. + +In the Fess admin UI it is reached via **Crawler > Key Match**, where you can list, create, edit, and delete entries. A common use case described in the documentation is advertising or promoting curated answers for a specific keyword. + +`fessctl keymatch` exposes the same lifecycle through the admin API so that Key Match entries can be managed from scripts, CI pipelines, or migration tools alongside the rest of your Fess configuration. + +## When to use + +- Pinning a curated FAQ or canonical answer to the top whenever a known support query is searched (e.g. term `password reset`, query targeting the official password-reset page). +- Promoting branded or campaign content for navigational queries (e.g. term `pricing`, query selecting the pricing landing page). +- Boosting an internal announcement or policy document for a specific keyword without globally re-ranking the index. +- Overriding poor relevance for an ambiguous term while you investigate analyzer or synonym tuning. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments | +| --- | --- | --- | +| `create` | Register a new Key Match entry | `--term`, `--query`, `--max-size`, `--boost`, `--version-no`, `--virtual-host`, `--created-by`, `--created-time`, `--output` | +| `update` | Modify an existing entry by ID | `config_id` (positional); optional `--term`, `--query`, `--max-size`, `--boost`, `--version-no`, `--virtual-host`, `--updated-by`, `--updated-time`, `--output` | +| `delete` | Remove a Key Match entry by ID | `config_id` (positional), `--output` | +| `get` | Show full detail of one entry | `config_id` (positional), `--output` | +| `list` | List Key Match entries with paging | `--page/-p`, `--size/-s`, `--output/-o` | + +Always reconfirm with `fessctl keymatch --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "term": "support", + "query": "url:https://example.com/support/*", + "max_size": 10, + "boost": 100.0, + "virtual_host": "search.example.com", + "version_no": 1, + "created_by": "admin", + "created_time": 1714694400000, + "updated_by": "admin", + "updated_time": 1714694400000 +} +``` + +Required when creating: `term`, `query`, `max_size`, `boost`, `version_no`. `virtual_host` is optional and only sent when provided. `created_by` / `created_time` (and `updated_by` / `updated_time` on update) default to `admin` and the current UTC time in milliseconds. + +## Relationships + +- Operates on documents already present in the search index; it does not crawl or alter source content. +- Takes effect at search time, evaluated by the same Fess instance that serves the query. +- When `virtual_host` is set, the entry only applies to searches arriving through that virtual host (see the Fess virtual host configuration guide). +- The boost value combines with the regular relevance score, so very large boosts effectively pin the matched documents while smaller ones nudge ordering. +- Sits alongside other ranking tools: `boostdoc` (document-level boosting), `elevateword` (term-level promotion), and `relatedquery` (query suggestions). + +## Gotchas + +- Key Match changes do not take effect immediately for live search; the Key Match cache must be reloaded (typically via the admin UI or by restarting Fess) for new or updated entries to be applied. +- The `query` field is a Fess query string (e.g. `url:...`, `title:...`); a malformed query yields no promoted documents and no obvious error in search results. +- The `term` is matched against the user's search term; matching follows the same analyzer behavior as ordinary searches, so casing and tokenization can affect when an entry triggers. +- `virtual_host` scopes an entry to one host. An entry without `virtual_host` applies broadly; mixing scoped and unscoped entries can produce surprising overlaps. +- `max_size` caps the number of pinned documents; if the query matches more documents than `max_size`, only that many are elevated. +- Very large `boost` values dominate the score and may hide otherwise relevant results; tune carefully. +- `version_no` is required on `create` and used for optimistic locking on `update`; supply the value returned by `get` to avoid conflicts. + +## Examples + +Minimal create: + +```bash +fessctl keymatch create \ + --term "support" \ + --query "url:https://example.com/support/*" \ + --max-size 10 \ + --boost 100 \ + --version-no 1 +``` + +Update an existing entry (raise the boost and change the query): + +```bash +fessctl keymatch update KM_ID_HERE \ + --query "url:https://example.com/support/index.html" \ + --boost 200 +``` + +List entries and filter to those whose term contains `support` using `jq`: + +```bash +fessctl keymatch list --output json \ + | jq '.response.settings[] | select(.term | test("support"))' +``` + +## See also + +- fess-docs: en/15.6/admin/keymatch-guide.rst +- workflows.md: Recipe 5 (search relevance tuning) +- Related features: `references/features/boostdoc.md`, `references/features/elevateword.md`, `references/features/relatedquery.md` diff --git a/skills/fessctl/references/features/labeltype.md b/skills/fessctl/references/features/labeltype.md new file mode 100644 index 0000000..72f2f1f --- /dev/null +++ b/skills/fessctl/references/features/labeltype.md @@ -0,0 +1,87 @@ +# Label Types (fessctl `labeltype`) + +## What it is +Label Types classify documents in the Fess search index. Each label has a human-readable `name` (shown to end users) and a machine-readable `value` (the identifier persisted on each document). Inclusion/exclusion is driven by URL regular-expression patterns: any crawled document whose URL matches an `included_paths` regex (and does not match `excluded_paths`) is tagged with that label at index time. + +In the admin UI labels are managed at `Crawler > Label`. Once at least one label is registered, a label pull-down appears in the search options so end users can filter results, and the same labels surface as facets/refinements alongside the result list. Labels are also referenced from web, file, and data store crawl configs to scope which crawl jobs may emit which labels. + +## When to use +- Categorize crawled web/file content by department, product, or site (e.g. "HR docs", "Engineering wiki") so the same Fess instance serves many audiences. +- Give end users a one-click facet to narrow a search to a known subset (e.g. "Manuals only" vs "All content"). +- Carve out a virtual-host-specific catalog when the same Fess instance is fronted by multiple hostnames. +- Combine with `--permission` to scope a label so only certain roles/groups/users see it in the dropdown and in results. + +## Subcommand surface +| Subcommand | Purpose | Required arguments | +| --- | --- | --- | +| `create` | Register a new label type. | `--name`, `--value`, `--version-no` | +| `update` | Modify an existing label type by ID; unspecified fields are preserved. | `` | +| `delete` | Permanently delete a label type by ID. | `` | +| `get` | Show one label type's full detail. | `` | +| `list` | Page through all label types (default page size 100). | none | + +Always reconfirm with `fessctl labeltype --help`. + +## Resource JSON shape +```json +{ + "crud_mode": 1, + "name": "Engineering", + "value": "eng", + "version_no": 1, + "sort_order": 0, + "included_paths": "https://wiki.example.com/eng/.*\nhttps://docs.example.com/eng/.*", + "excluded_paths": "https://wiki.example.com/eng/private/.*", + "permissions": "{role}admin\n{group}developer", + "virtual_host": "eng.search.example.com", + "created_by": "admin", + "created_time": 1714694400000, + "updated_by": "admin", + "updated_time": 1714694400000 +} +``` +Required: `name`, `value`, `version_no`. Optional: `sort_order`, `included_paths`, `excluded_paths`, `permissions`, `virtual_host`. The CLI accepts repeatable `--included-path`, `--excluded-path`, and `--permission` flags and joins them with newlines before posting (matching the admin UI's multi-line text areas). + +## Relationships +- Web crawl configs (`fessctl webconfig`) reference labels via `label_type_ids` so a single web crawl can tag every emitted document. +- File system crawl configs (`fessctl fileconfig`) reference labels the same way via `label_type_ids`. +- Data store configs (`fessctl dataconfig`) also accept `label_type_ids` to tag rows ingested from external sources. +- Labels surface in the search UI as a pull-down filter and in the JSON search response as facets/refinements. +- `permissions` entries follow the standard `{user}name`, `{role}name`, `{group}name` syntax shared with role/group management. +- `virtual_host` ties into the Virtual Host configuration documented under `config/security-virtual-host`. + +## Gotchas +- Labels are written onto documents at index time. Adding, renaming, or re-scoping a label has no effect on already-indexed content until you recrawl the affected configs. +- `value` is the identifier stored on every document and used in queries; restrict it to alphanumeric characters and avoid changing it after content has been indexed (existing docs would be orphaned). +- Deleting a label that is still listed in a crawl config's `label_type_ids` leaves dangling references. Either drop the label from each `webconfig`/`fileconfig`/`dataconfig` first, or accept that the references will resolve to nothing. +- `included_paths` and `excluded_paths` are regular expressions, not glob patterns; remember to escape `.` and anchor with `.*` where appropriate. +- `--version-no` is required on `create` and is used for optimistic concurrency on `update`; bump it deliberately when scripting bulk updates. +- `created_time`/`updated_time` are epoch milliseconds (UTC). The CLI auto-fills them with "now" if omitted. + +## Examples +```bash +# Minimal create: a label that tags everything under /docs/ +fessctl labeltype create \ + --name "Documentation" \ + --value docs \ + --version-no 1 \ + --included-path "https://www.example.com/docs/.*" +``` + +```bash +# Update: add an exclusion and restrict visibility to the developer group +fessctl labeltype update LABEL_ID \ + --excluded-path "https://www.example.com/docs/internal/.*" \ + --permission "{group}developer" +``` + +```bash +# List and filter via JSON output to find a label by value +fessctl labeltype list --output json \ + | jq '.response.settings[] | select(.value == "docs")' +``` + +## See also +- fess-docs: en/15.6/admin/labeltype-guide.rst +- workflows.md: Recipe 1 +- Related features: `references/features/webconfig.md`, `references/features/fileconfig.md`, `references/features/dataconfig.md` From 4e82566014c144e85c52261be0edb6c5287a671c Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:42:31 +0900 Subject: [PATCH 11/16] feat(skill): add feature references batch 4 (relatedcontent, relatedquery, pathmap, duplicatehost, reqheader) --- .../references/features/duplicatehost.md | 88 +++++++++++++++++ skills/fessctl/references/features/pathmap.md | 92 ++++++++++++++++++ .../references/features/relatedcontent.md | 94 +++++++++++++++++++ .../references/features/relatedquery.md | 91 ++++++++++++++++++ .../fessctl/references/features/reqheader.md | 80 ++++++++++++++++ 5 files changed, 445 insertions(+) create mode 100644 skills/fessctl/references/features/duplicatehost.md create mode 100644 skills/fessctl/references/features/pathmap.md create mode 100644 skills/fessctl/references/features/relatedcontent.md create mode 100644 skills/fessctl/references/features/relatedquery.md create mode 100644 skills/fessctl/references/features/reqheader.md diff --git a/skills/fessctl/references/features/duplicatehost.md b/skills/fessctl/references/features/duplicatehost.md new file mode 100644 index 0000000..f094ac1 --- /dev/null +++ b/skills/fessctl/references/features/duplicatehost.md @@ -0,0 +1,88 @@ +# Duplicate Host (fessctl `duplicatehost`) + +## What it is + +Duplicate Host configurations tell the Fess crawler to treat one or more hostnames as equivalent to a single canonical hostname. At crawl time, any URL whose host matches a registered duplicate name is rewritten so that its host becomes the configured regular (canonical) name. This prevents the same content from being fetched, stored, and indexed twice when it is reachable through multiple equivalent hostnames such as `www.example.com` and `example.com`. + +The feature is managed in the admin UI under **Crawler > Duplicate Hosts**. Each rule is a simple pair: a Regular Name (the canonical hostname that should appear in the index) and a Duplicate Name (the hostname that will be replaced). Rules are evaluated globally for every web crawl run by the server. + +Because the rewrite happens during URL normalization in the crawler, duplicate-host rules apply uniformly to all web crawl configurations on the instance and do not require per-config wiring. + +## When to use + +- Consolidate `www.example.com` and `example.com` (or other host aliases) into a single canonical host so duplicate documents are not indexed. +- Treat staging or mirror hostnames (for example `mirror.example.com`) as the production host for indexing purposes. +- Normalize legacy or rebranded hostnames after a domain migration so old URLs collapse onto the new canonical name. +- Combine multiple regional hostnames that serve identical content under one canonical name when intentional. + +## Subcommand surface + +| Subcommand | Purpose | Key inputs | +|------------|---------|------------| +| `create` | Register a new duplicate-host rule. | `--regular-name`, `--duplicate-host-name`, `--sort-order`, optional `--created-by`, `--created-time`, `--output` | +| `update` | Modify an existing rule by ID; only provided fields are changed. | `config_id` (positional), optional `--regular-name`, `--duplicate-host-name`, `--sort-order`, `--updated-by`, `--updated-time`, `--output` | +| `delete` | Remove a rule by ID. | `config_id` (positional), `--output` | +| `get` | Show details of a single rule by ID. | `config_id` (positional), `--output` | +| `list` | List rules with pagination. | `--page`/`-p`, `--size`/`-s`, `--output`/`-o` | + +Always reconfirm with `fessctl duplicatehost --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "regular_name": "www.example.com", + "duplicate_host_name": "example.com", + "sort_order": 1, + "created_by": "admin", + "created_time": 1730000000000 +} +``` + +On `update`, the client first fetches the existing setting, sets `crud_mode` to `2`, and overlays `updated_by` and `updated_time` plus any of `regular_name`, `duplicate_host_name`, or `sort_order` you supplied. Read responses (`get`, `list`) also expose `id` and `version_no` returned by the server. + +## Relationships + +- Rewriting is performed at crawl time during URL normalization, before fetch and storage. +- Rules apply globally to every web crawl; they are not bound to a specific `webconfig` entry. +- They affect what hostname is recorded in the index, so they interact with deduplication and reporting. +- Path-level normalization (`pathmap`) is independent and can be combined with duplicate-host rules. + +## Gotchas + +- Existing indexed documents are not rewritten. After adding or changing a rule, recrawl the affected sites and remove or expire stale documents to actually collapse duplicates. +- `regular_name` and `duplicate_host_name` are matched as host strings; ports, schemes, and paths are not part of the rule. +- IP addresses and FQDNs are distinct hosts. If a site is reachable both as `192.0.2.10` and `example.com`, register both directions as needed. +- Internationalized Domain Names should be supplied in the form the crawler observes (typically Punycode `xn--...`); mixing Unicode and ASCII forms can lead to non-matching rules. +- `sort_order` is required on `create` and must be a non-negative integer; it controls the order in admin listings, not match precedence. +- `created_time` and `updated_time` default to "now" in milliseconds (UTC). Override only when reproducing or backdating records. + +## Examples + +```bash +# Minimal create: collapse example.com into www.example.com +fessctl duplicatehost create \ + --regular-name www.example.com \ + --duplicate-host-name example.com \ + --sort-order 1 +``` + +```bash +# Update an existing rule to change the duplicate hostname +fessctl duplicatehost update \ + --duplicate-host-name old.example.com \ + --sort-order 5 +``` + +```bash +# List rules and filter for a specific regular name via jq +fessctl duplicatehost list --output json \ + | jq '.response.settings[] | select(.regular_name == "www.example.com")' +``` + +## See also + +- fess-docs: en/15.6/admin/duplicatehost-guide.rst +- workflows.md: n/a +- Related features: `references/features/webconfig.md`, `references/features/pathmap.md` diff --git a/skills/fessctl/references/features/pathmap.md b/skills/fessctl/references/features/pathmap.md new file mode 100644 index 0000000..1f0d881 --- /dev/null +++ b/skills/fessctl/references/features/pathmap.md @@ -0,0 +1,92 @@ +# Path Mapping (fessctl `pathmap`) + +## What it is + +A Path Mapping is a regex-based URL rewrite rule applied by Fess at crawl time and/or display time. The administrator defines a Java regular expression and a replacement string; matching URLs are rewritten before being indexed, before being shown in search results, or while extracting links from HTML during web crawling. In the admin UI it lives at **Crawler -> Path Mapping** (left sidebar: "Crawler > Path Mapping"). + +Typical use is bridging the gap between how the crawler reaches content and how end users should see or click it. For example, the file crawler may reach documents at `file:/srv/documents/...` while users should see `http://fileserver.example.com/documents/...`. Path Mapping handles that translation declaratively, without modifying crawl configs or post-processing the index. + +The `fessctl pathmap` subcommand is a thin CRUD wrapper over the Fess admin API for these entries. It is convenient for scripting environment migrations, normalizing URLs across Fess instances, and rolling out the same rewrite ruleset to multiple deployments. + +## When to use + +- Rewriting an internal file-server path (`file:/srv/...`) to a public web URL so search results are clickable. +- Keeping the original path in the index but presenting a friendlier URL only at display time (`Displaying` mode). +- Migrating a site between hostnames: rewrite extracted links from `old-server.example.com` to `new-server.example.com` so the crawler enqueues the new host (`Extracted URL Conversion`). +- Stripping or normalizing tracking parameters and trailing slashes so duplicate-looking URLs collapse to a single canonical form. + +## Subcommand surface + +| Subcommand | Purpose | Notes | +|------------|---------|-------| +| `create` | Create a new Path Mapping. | Requires `--regex` and `--process-type`. `--replacement` is optional but almost always wanted. | +| `update` | Update an existing Path Mapping by ID. | Takes positional `CONFIG_ID`. Read-modify-write: only fields explicitly passed are overwritten. | +| `delete` | Delete a Path Mapping by ID. | Irreversible. Already-indexed documents keep whatever URL was stored at index time. | +| `get` | Retrieve one Path Mapping by ID. | Renders a Markdown detail view by default; switch with `--output json` or `--output yaml`. | +| `list` | List Path Mappings. | Supports `--page` / `--size` pagination. Default page size is 100. | + +Always reconfirm with `fessctl pathmap --help`. + +## Resource JSON shape + +```json +{ + "regex": "file:/srv/documents/", // required, Java regex + "replacement": "http://fileserver.example.com/documents/", // optional but typically required + "process_type": "C", // required: C=Crawling, D=Displaying, B=Both, E=Extracted URL + "user_agent": "", // optional, regex matched against UA; empty = all + "sort_order": 0, // optional, ascending application order + "created_by": "admin", // optional, default "admin" + "created_time": 1714723200000 // optional, defaults to now (epoch ms, UTC) +} +``` + +The `process_type` field is sent verbatim as supplied on the command line. Acceptable values follow the admin UI semantics: Crawling, Displaying, Crawling/Displaying (Both), and Extracted URL Conversion. Confirm the exact code your Fess version expects via `fessctl pathmap get --output json` against an entry created in the UI. + +## Relationships + +- Applied during **crawling** (rewrites URL before indexing) and/or **display** (rewrites URL when rendering search results) depending on `process_type`. +- Affects **all crawl configs** (Web, File, Data) globally; there is no per-config scoping. Use `user_agent` for narrower targeting on the web crawler. +- `Extracted URL Conversion` only affects the **web crawler** when extracting links from HTML; it does not apply to file or datastore crawlers. +- Interacts with `fessctl webconfig` and `fessctl fileconfig`: the rewritten URL is what ends up in the index, so permissions, label filters, and virtual hosts should be evaluated against the post-rewrite URL. +- Multiple mappings are applied in `sort_order` ascending; chain rules carefully so earlier rewrites do not invalidate later regex matches. + +## Gotchas + +- Changes to `Crawling` (or `Both`) mappings only affect documents indexed **after** the change. Existing documents keep their stored URL until re-crawled; trigger a recrawl to apply the new mapping retroactively. +- Regex syntax is **Java** regex, not POSIX or PCRE. Special characters (`.`, `?`, `+`, `(`, `)`) must be escaped with backslashes; use back references like `$1`, `$2` in the replacement. +- When passing patterns via shell, quote them in single quotes to prevent the shell from interpreting `$1`, backslashes, or parentheses. For example: `--regex 'http://old\.example\.com/(.*)' --replacement 'http://new.example.com/$1'`. +- `process_type` is sent as a raw string. Provide the value your Fess version stores (typically a single letter such as `C`, `D`, `B`, `E`); verify by inspecting an existing UI-created entry with `--output json` first. +- `user_agent` is itself a regex; an empty value matches all requests. Be precise to avoid unintentionally rewriting all crawler traffic. +- `Extracted URL Conversion` does not change URLs already saved in the index; it only affects what gets enqueued for crawling. +- `update` performs a read-modify-write: if the config ID does not exist, the command exits non-zero with a "not found" message. +- Deleting a mapping does not undo URL changes already written to the index; you need a recrawl to restore original URLs. + +## Examples + +```bash +# Minimal create: rewrite file-server paths to a public web URL at index time +fessctl pathmap create \ + --regex 'file:/srv/documents/' \ + --replacement 'http://fileserver.example.com/documents/' \ + --process-type C +``` + +```bash +# Update an existing mapping: change replacement and bump sort order +fessctl pathmap update PM1a2b3c4d5 \ + --replacement 'https://files.example.com/docs/' \ + --sort-order 10 +``` + +```bash +# List all path mappings as JSON and filter only the display-time rules +fessctl pathmap list --size 200 --output json \ + | jq '.response.settings[] | select(.process_type == "D") | {id, regex, replacement, sort_order}' +``` + +## See also + +- fess-docs: en/15.6/admin/pathmap-guide.rst +- workflows.md: n/a +- Related features: `references/features/webconfig.md` diff --git a/skills/fessctl/references/features/relatedcontent.md b/skills/fessctl/references/features/relatedcontent.md new file mode 100644 index 0000000..e18ab16 --- /dev/null +++ b/skills/fessctl/references/features/relatedcontent.md @@ -0,0 +1,94 @@ +# Related Content (fessctl `relatedcontent`) + +## What it is + +Related Content is a Fess feature that displays a custom HTML or text snippet at the top of search results whenever the user's search query matches a configured search term. The administrator registers a term and the corresponding content body, and Fess injects that snippet alongside the natural search results when the term is matched. + +In the Fess admin UI it is reached via **Crawler > Related Content**, where you can list, create, edit, and delete entries. The configuration is intentionally small (term + content, optionally scoped by virtual host), making it useful for promotional banners, in-result advisories, or short editorial answers tied to a known query. + +`fessctl relatedcontent` exposes the same lifecycle through the admin API so that Related Content entries can be managed from scripts, CI pipelines, or migration tools alongside the rest of your Fess configuration. + +## When to use + +- Showing a curated answer or banner ("Office is closed on May 5") whenever users search a known term. +- Promoting a campaign or product page above the organic results for a navigational query (e.g. term `pricing`). +- Surfacing a help-desk notice or maintenance message tied to a specific keyword without re-ranking the index. +- Providing localized or virtual-host-specific in-result content by scoping entries with `virtual_host`. + +## Subcommand surface + +| Subcommand | Purpose | Key arguments | +| --- | --- | --- | +| `create` | Register a new Related Content entry | `--term`, `--content`, `--sort-order`, `--virtual-host`, `--created-by`, `--created-time`, `--output/-o` | +| `update` | Modify an existing entry by ID | `config_id` (positional); optional `--term`, `--content`, `--sort-order`, `--virtual-host`, `--updated-by`, `--updated-time`, `--output/-o` | +| `delete` | Remove a Related Content entry by ID | `config_id` (positional), `--output/-o` | +| `get` | Show full detail of one entry | `config_id` (positional), `--output/-o` | +| `list` | List Related Content entries with paging | `--page/-p`, `--size/-s`, `--output/-o` | + +Always reconfirm with `fessctl relatedcontent --help`. + +## Resource JSON shape + +```json +{ + "crud_mode": 1, + "term": "holiday", + "content": "
Our office is closed May 3-5.
", + "sort_order": 0, + "virtual_host": "search.example.com", + "created_by": "admin", + "created_time": 1714694400000, + "updated_by": "admin", + "updated_time": 1714694400000 +} +``` + +Required when creating: `term` and `content`. Optional: `sort_order` (defaults to `0`) and `virtual_host` (only sent when provided). `created_by` / `created_time` (and `updated_by` / `updated_time` on update) default to `admin` and the current UTC time in milliseconds. `id` and `version_no` are returned by the server and used by `update` for optimistic locking. + +## Relationships + +- Surfaced in the UI search results page when the user's query term matches the configured `term`. +- Operates purely at search-render time; it does not affect the index, crawler, or document scoring. +- When `virtual_host` is set, the entry only applies to searches arriving through that virtual host (see the Fess virtual host configuration guide). +- `sort_order` controls relative ordering when multiple Related Content entries match the same query. +- Complementary to ranking-side tools such as `keymatch` (document pinning), `boostdoc` (document-level boosting), and `relatedquery` (query suggestions); Related Content is the only one that injects raw markup. + +## Gotchas + +- Related Content changes do not take effect immediately for live search; the Related Content cache must be reloaded (typically via the admin UI or by restarting Fess) for new or updated entries to be applied. +- The `content` field is rendered into the search results page; HTML is generally not escaped, so untrusted markup is a XSS risk - only register content you trust and escape user-supplied values yourself. +- The `term` is matched against the user's search query; matching is sensitive to casing and tokenization, so an entry registered as `Pricing` may not fire for the query `pricing` depending on analyzer settings. +- `virtual_host` scopes an entry to one host. An entry without `virtual_host` applies broadly; mixing scoped and unscoped entries can produce overlapping snippets. +- `sort_order` is a plain integer and is not auto-deduplicated across entries; assign distinct values when ordering matters. +- `update` performs a read-modify-write: it first fetches the existing entry, so the target ID must already exist or the command exits non-zero. + +## Examples + +Minimal create: + +```bash +fessctl relatedcontent create \ + --term "holiday" \ + --content "
Our office is closed May 3-5.
" +``` + +Update an existing entry (change content and bump sort order): + +```bash +fessctl relatedcontent update RC_ID_HERE \ + --content "
Office reopens May 6.
" \ + --sort-order 10 +``` + +List entries and filter to those whose term contains `holiday` using `jq`: + +```bash +fessctl relatedcontent list --output json \ + | jq '.response.settings[] | select(.term | test("holiday"))' +``` + +## See also + +- fess-docs: en/15.6/admin/relatedcontent-guide.rst +- workflows.md: n/a +- Related features: `references/features/relatedquery.md`, `references/features/keymatch.md` diff --git a/skills/fessctl/references/features/relatedquery.md b/skills/fessctl/references/features/relatedquery.md new file mode 100644 index 0000000..c854f7b --- /dev/null +++ b/skills/fessctl/references/features/relatedquery.md @@ -0,0 +1,91 @@ +# Related Query (fessctl `relatedquery`) + +## What it is + +Related queries let administrators register alternative search terms that get surfaced when an end user submits a configured term. When the search frontend receives a query that matches the registered `term`, Fess promotes the configured `queries` as related-query suggestions, helping users discover phrasings that perform better against the indexed corpus. + +In the admin UI, related queries are managed from **Crawler > Related Query**. Each entry binds a single search term to one or more alternative query expressions and may be scoped to a specific virtual host so that different sites served by the same Fess instance can ship distinct related-query sets. + +The `fessctl relatedquery` command group wraps the `/api/admin/relatedquery` endpoints so configurations can be created, listed, retrieved, updated, and deleted from scripts without using the web UI. + +## When to use + +- Surface synonyms or alternative phrasings (for example, suggest `search engine` when users type `fess`). +- Redirect common misspellings to canonical queries that have better recall. +- Provide brand or product-name aliases for marketing campaigns. +- Tune query coverage per virtual host when multiple sites share one Fess deployment. + +## Subcommand surface + +| Subcommand | Purpose | Required input | +|------------|---------|----------------| +| `create` | Register a new related query entry. | `--term`, `--queries`, `--version-no` | +| `update` | Modify an existing entry by ID, preserving untouched fields. | `config_id` argument; optional `--term`, `--queries`, `--virtual-host` | +| `delete` | Remove an entry by ID. | `config_id` argument | +| `get` | Fetch a single entry by ID. | `config_id` argument | +| `list` | Page through registered entries. | `--page`, `--size` (both optional) | + +Always reconfirm with `fessctl relatedquery --help`. + +## Resource JSON shape + +```json +{ + "id": "REL-XXXXXXXXXXXXXXXX", + "term": "fess", + "queries": "search engine\nfull text search", + "virtual_host": "site-a", + "version_no": 1, + "crud_mode": 1, + "created_by": "admin", + "created_time": 1735689600000, + "updated_by": "admin", + "updated_time": 1735689600000 +} +``` + +`term` and `queries` are required by the API. `virtual_host` is optional and only sent when supplied. `queries` is transmitted as a string; multiple alternatives are typically separated by newlines (escape as `\n` on the command line). + +## Relationships + +- Surfaced through the Fess search UI as related-query suggestions when the user-supplied query matches a registered `term`. +- Changes may require a related-query cache reload before they appear in search responses; restart or trigger a reload after bulk edits. +- `virtual_host` scopes the entry to a specific virtual host configuration; entries without `virtual_host` apply globally. +- Distinct from synonym dictionaries (which rewrite tokens at index/query time): related queries do not modify scoring, they only present alternative searches to the user. + +## Gotchas + +- After create/update/delete the related-query cache may need to be reloaded before the change is visible in search responses; plan a reload step in automation pipelines. +- The `term` is matched against the user query; casing and whitespace handling depend on Fess's matcher, so register the canonical form you observe in query logs. +- `queries` is a single string field on the wire even though it represents a list. Use newline separators (`$'\n'` in bash, `\n` inside JSON) so the UI renders each alternative on its own line. +- Related queries are not synonyms; they neither expand the underlying query nor rewrite tokens. If you need recall changes, use synonym/stopword dictionaries instead. +- `--version-no` is mandatory on `create`; supply `1` for new entries and trust optimistic locking to bump it on subsequent updates. +- `update` performs a read-modify-write cycle: the command first calls `get` and aborts if the ID is not found. + +## Examples + +```bash +# Minimal create: associate the term "fess" with two alternative queries. +fessctl relatedquery create \ + --term "fess" \ + --queries $'search engine\nfull text search' \ + --version-no 1 +``` + +```bash +# Update only the queries field for an existing entry. +fessctl relatedquery update REL-1234567890ABCDEF \ + --queries $'search engine\nopen source search\nenterprise search' +``` + +```bash +# List entries as JSON and filter for a specific virtual host with jq. +fessctl relatedquery list --size 200 --output json \ + | jq '.response.settings[] | select(.virtual_host == "site-a")' +``` + +## See also + +- fess-docs: en/15.6/admin/relatedquery-guide.rst +- workflows.md: Recipe 5 +- Related features: `references/features/relatedcontent.md`, `references/features/elevateword.md` diff --git a/skills/fessctl/references/features/reqheader.md b/skills/fessctl/references/features/reqheader.md new file mode 100644 index 0000000..0ad7b84 --- /dev/null +++ b/skills/fessctl/references/features/reqheader.md @@ -0,0 +1,80 @@ +# Request Header (fessctl `reqheader`) + +## What it is +Request Headers are extra HTTP headers that Fess attaches to outgoing fetches when its web crawler retrieves documents. Each header is a name/value pair bound to a specific Web Crawl Configuration (`web_config_id`), so the same crawler can target multiple sites while sending different headers to each one. + +Typical headers include `User-Agent`, `Cookie`, `Accept-Language`, `Authorization`, or any custom token (e.g. `X-Api-Key`) that the upstream site requires. The fess-docs guide notes these can be used to drive automatic login on systems whose authentication is keyed on header values. + +In the Admin UI this resource lives under "Crawler > Request Header". The `fessctl reqheader` subcommand exposes the same CRUD surface via the admin API. + +## When to use +- Inject an API token (`Authorization: Bearer ...` or `X-Api-Key: ...`) when crawling a protected internal endpoint that does not use HTTP Basic / Digest auth (which would belong to `webauth`). +- Override the default crawler `User-Agent` so analytics or WAF rules can identify Fess traffic separately from real users. +- Send `Accept-Language: ja-JP` (or similar) so a multilingual site returns the locale you want indexed. +- Pass a sticky `Cookie` header for sites whose session is provisioned out-of-band and merely needs to be replayed on each fetch. + +## Subcommand surface +| Subcommand | Required args | Key options | Purpose | +|---|---|---|---| +| `create` | `--name`, `--value`, `--web-config-id` | `--created-by`, `--created-time`, `--output/-o` | Register a new header bound to a web config. | +| `update` | `REQHEADER_ID` (positional) | `--name`, `--value`, `--web-config-id`, `--updated-by`, `--updated-time`, `--output/-o` | Patch fields on an existing header (fetched then merged). | +| `delete` | `REQHEADER_ID` (positional) | `--output/-o` | Remove the header by ID. | +| `get` | `REQHEADER_ID` (positional) | `--output/-o` | Show one header's details. | +| `list` | none | `--page/-p`, `--size/-s`, `--output/-o` | Paginated listing (default `size=100`). | + +Always reconfirm with `fessctl reqheader --help`. + +## Resource JSON shape +```json +{ + "crud_mode": 1, + "name": "X-Api-Key", + "value": "s3cr3t-token", + "web_config_id": "WEB-CONFIG-ID-HERE", + "created_by": "admin", + "created_time": 1735689600000, + "updated_by": "admin", + "updated_time": 1735689600000 +} +``` +Required on create: `name` (max 100), `value` (max 1000), `web_config_id` (max 1000). `crud_mode` is set automatically by fessctl (`1` for create, `2` for update). On `get`, the response also exposes audit fields (`id`, `version_no`) and may surface auxiliary fields rendered by the detail formatter (`hostname`, `port`, `auth_realm`, `protocol_scheme`, `username`, `password`, `parameters`). + +## Relationships +- Depends on `webconfig`: the `--web-config-id` MUST point to an existing Web Crawl Configuration (`fessctl webconfig list`). The header only fires for crawls run by that web config. +- Complementary to `webauth`: use `webauth` for Basic / Digest / NTLM / Form auth; use `reqheader` for token / cookie / custom header auth. +- Has no link to `fileconfig` or `dataconfig` — it applies only to web (HTTP) fetches. + +## Gotchas +- Secret hygiene: header values are stored in plain text in the Fess index. Treat tokens like credentials — do not paste them into shell history (`HISTFILE`), shared logs, or `--output json` dumps you commit to git. Consider sourcing the value from `$ENV_VAR` at the call site. +- A `User-Agent` reqheader overrides the default crawler UA configured in `fess_config.properties` for that web config only. Verify the change with a probe request before relying on it. +- `update` performs a fetch-then-merge: the current record is read, your `--name` / `--value` / `--web-config-id` overlay it, then the whole document is re-sent. Omitted flags are preserved, not cleared. +- Deleting a reqheader does NOT cascade to or modify the parent webconfig; the crawl simply stops sending that header on the next run. Conversely, deleting a webconfig leaves orphan reqheader rows referencing a missing `web_config_id` — clean them up explicitly. +- `created_time` / `updated_time` default to "now" in UTC milliseconds. Override only when reproducing or migrating data. +- `list` is paginated (`--page`, `--size`, default `size=100`); pipe through `jq` for filtering since there is no server-side `--name` filter flag. + +## Examples +Minimal create — bind an API key header to an existing web config: +```bash +fessctl reqheader create \ + --name "X-Api-Key" \ + --value "$INTERNAL_API_TOKEN" \ + --web-config-id "WEB-CFG-123abc" +``` + +Update the value (e.g. token rotation) without touching name or binding: +```bash +fessctl reqheader update REQHDR-456def \ + --value "$NEW_INTERNAL_API_TOKEN" \ + --updated-by "rotation-bot" +``` + +List and filter by web config via JSON output: +```bash +fessctl reqheader list --output json \ + | jq '.response.settings[] | select(.web_config_id == "WEB-CFG-123abc")' +``` + +## See also +- fess-docs: en/15.6/admin/reqheader-guide.rst +- workflows.md: n/a +- Related features: `references/features/webconfig.md`, `references/features/webauth.md` From e4e6a607f4f634a08ba6313a59695d7ed01b6796 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:42:31 +0900 Subject: [PATCH 12/16] feat(skill): add feature references batch 5 (joblog, crawlinginfo) --- .../references/features/crawlinginfo.md | 131 ++++++++++++++++++ skills/fessctl/references/features/joblog.md | 94 +++++++++++++ 2 files changed, 225 insertions(+) create mode 100644 skills/fessctl/references/features/crawlinginfo.md create mode 100644 skills/fessctl/references/features/joblog.md diff --git a/skills/fessctl/references/features/crawlinginfo.md b/skills/fessctl/references/features/crawlinginfo.md new file mode 100644 index 0000000..5efa8a3 --- /dev/null +++ b/skills/fessctl/references/features/crawlinginfo.md @@ -0,0 +1,131 @@ +# Crawling Info (fessctl `crawlinginfo`) + +## What it is + +Crawling Info is the per-session metadata produced by Fess every time a crawl +runs. The Admin UI exposes it under **System → Crawling Info**, where each row +represents a single crawl session identified by its `session_id`. The page is +intended for observation and post-mortem analysis, not for authoring. + +Each session record collects timing and counter data that the crawler emits +internally: when the overall crawl started and ended, when the web/file and +data-store phases started and ended, how long indexing took, how many +documents were indexed, and the final crawler status (success or failure). +These values are written by the crawler subsystem itself; users do not +populate them. + +Because rows are produced automatically by background scheduler runs, the +`fessctl crawlinginfo` surface is read-only: you can list sessions, fetch one +by ID, or delete stale records to free up storage. There is no `create` or +`update` verb — those operations are not exposed by the underlying admin API +and would be meaningless for crawler-emitted telemetry. + +## When to use + +- Monitor an in-flight crawl: list recent sessions to see whether the + scheduled crawler started, and which phase it is currently in. +- Post-mortem on a failed crawl: fetch the row for the failing `session_id` + to inspect the recorded crawler status, end time, and execution time. +- Capacity / housekeeping: periodically delete old or expired session records + so the `.crawler` indices do not grow without bound. +- Correlate with `joblog`: combine the script-level run information from + `joblog` with the crawler-internal counters in `crawlinginfo` to build a + full picture of a scheduler invocation. + +## Subcommand surface + +| Subcommand | Purpose | +| --- | --- | +| `list` | Page through crawling-info rows (`--page`, `--size`, `--output`). | +| `get` | Retrieve one row by `crawlinginfo_id` (`--output`). | +| `delete` | Remove one row by `crawlinginfo_id` (`--output`). | + +This resource is **read-only** with respect to authoring: there is no +`create` and no `update` subcommand, because crawling-info rows are +generated by the crawler itself. Always reconfirm with +`fessctl crawlinginfo --help`. + +## Resource JSON shape + +The admin API returns rows under `response.logs` (list) or `response.log` +(single). Fields are server-populated. + +```json +{ + "id": "AY1n...e1Q", + "session_id": "20260503T120000-001", + "name": "CrawlerStartTime", + "value": "1714723200000", + "expired_time": 1717315200000, + "created_time": 1714723200000 +} +``` + +Notes: + +- `id` is the OpenSearch document id; pass it as `crawlinginfo_id` to `get` + and `delete`. +- `session_id` groups all rows for one crawl run; multiple `name`/`value` + pairs share a single `session_id`. +- `expired_time` is the epoch-millis after which the row is eligible for + automatic GC by Fess. + +## Relationships + +- Produced by `scheduler` jobs that invoke the crawler (default jobs: + `Default Crawler`, `Default File Crawler`, etc.). +- Complementary to `joblog`: `joblog` records the script-level run + (start/end, exit status of the scheduled job), while `crawlinginfo` + records crawler-internal counters and phase timestamps. +- Consumes data emitted by `webconfig`, `fileconfig`, and `dataconfig` + targets that the scheduled crawler walked. +- Stored alongside other crawler runtime state in the `.crawler` index + family; deleting a row does not undo any indexed search documents. + +## Gotchas + +- **Read-only resource.** There is no `fessctl crawlinginfo create` and no + `fessctl crawlinginfo update`. Do not look for them, and do not invent + flags. The CLI exposes exactly `list`, `get`, and `delete`. +- Rows accumulate across every scheduled crawl run. On busy installations + this index grows quickly; plan periodic cleanup via `delete`. +- The `expired_time` field drives Fess's own automatic garbage collection. + Manual `delete` is only needed when you want to reclaim space sooner than + the configured retention window. +- `delete` removes a single row by id, not a whole `session_id`. To purge + an entire crawl session you must iterate over every row that shares the + session id. +- The `value` field is always a string in the JSON payload, even for + numeric counters and millisecond timestamps. Cast on the client side + when post-processing. +- Deleting a crawling-info row does not delete the indexed documents + produced by that crawl; it only removes the telemetry record. + +## Examples + +List the most recent crawling-info rows: + +```bash +fessctl crawlinginfo list --page 1 --size 20 +``` + +Fetch a single row by id: + +```bash +fessctl crawlinginfo get AY1n0abc_examplee1Q --output json +``` + +Bulk-delete every expired session row using `--output json | jq | xargs`: + +```bash +fessctl crawlinginfo list --size 1000 --output json \ + | jq -r --argjson now "$(date +%s000)" \ + '.response.logs[] | select(.expired_time != null and .expired_time < $now) | .id' \ + | xargs -n 1 -I {} fessctl crawlinginfo delete {} +``` + +## See also + +- fess-docs: en/15.6/admin/crawlinginfo-guide.rst +- workflows.md: Recipe 3 (failing crawl investigation) +- Related features: `references/features/scheduler.md`, `references/features/joblog.md` diff --git a/skills/fessctl/references/features/joblog.md b/skills/fessctl/references/features/joblog.md new file mode 100644 index 0000000..648cb75 --- /dev/null +++ b/skills/fessctl/references/features/joblog.md @@ -0,0 +1,94 @@ +# Job Log (fessctl `joblog`) + +## What it is + +Job Log records the execution history of every scheduler run in Fess. Each row captures the job name, execution status, target environment, the script that was executed, and the output (stdout/stderr) produced by the run. The Admin UI exposes this under **System > Job Log** (left sidebar). + +Joblog rows are produced by every scheduled (or manually triggered) scheduler run when `job_logging` is enabled on the scheduler entry. When a scheduler job starts, Fess writes a new job log row with a start timestamp and `RUNNING` status; when the run finishes, the row is updated with the end time, terminal status, and the captured `script_result` body. + +Use `fessctl joblog` to inspect this history programmatically — typically to debug failures, audit which jobs ran when, or feed results into downstream monitoring. + +## When to use + +- Debugging a failed crawl: pull the recent log for the crawler scheduler and read the captured `script_result` to see the stack trace or error message. +- Auditing job history: list runs over the last N pages to confirm whether nightly jobs (Default Crawler, Suggest Indexer, Purge Doc By Query, etc.) actually fired. +- Monitoring integration: emit `--output json` and pipe through `jq` to alert on any row whose `job_status` is not `OK`. +- Cleaning up noisy history: delete oversized or obsolete log entries when the index gets large. + +## Subcommand surface + +The `joblog` resource is read-and-delete only. There are no `create` or `update` verbs — joblog rows are written exclusively by the scheduler runtime in Fess itself. + +| Subcommand | Purpose | Key arguments / options | +|------------|---------|--------------------------| +| `list` | List job log rows (paginated, newest first depending on backend ordering). | `--page/-p` (default `1`), `--size/-s` (default `100`), `--output/-o` (`text`, `json`, `yaml`) | +| `get` | Retrieve a single job log row by ID, including the full captured script output. | `joblog_id` (positional), `--output/-o` | +| `delete` | Delete one job log row by ID. Useful for trimming oversized rows. | `joblog_id` (positional), `--output/-o` | + +Always reconfirm with `fessctl joblog --help`. + +## Resource JSON shape + +Field names below match what the Fess admin API returns and what `fessctl joblog get` renders. `id` is auto-generated by Fess on job start. + +```json +{ + "id": "AY7k0lJ4q9xQwExample", + "job_name": "Default Crawler", + "job_status": "OK", + "target": "all", + "script_type": "groovy", + "script_data": "return container.getComponent(\"crawlJob\").execute();", + "script_result": "Job execution log line 1\nJob execution log line 2\n...", + "start_time": 1714694400000, + "end_time": 1714698000000 +} +``` + +Notes: +- `start_time` and `end_time` are UNIX epoch milliseconds. `fessctl joblog get` and `list` render them as UTC ISO 8601 via `to_utc_iso8601`. +- `job_status` is a short enum string used by the scheduler runtime — typically `OK`, `FAIL`, or `RUNNING` (a row in flight will not yet have `end_time`). +- `script_result` contains the raw stdout/stderr captured during the run and can be very large for long crawls. +- `target` mirrors the scheduler entry's `target` field (e.g. `all`, or a specific executor label). + +## Relationships + +- Produced by `scheduler` runs — every job entry managed via `fessctl scheduler` that has `job_logging` enabled writes one joblog row per execution. See `references/features/scheduler.md`. +- Complements `crawlinginfo` for crawl-specific session metrics (counts, timings per session). Joblog gives the script-level view; crawling info gives the per-crawl-session view. +- Independent of `accesstoken`, `user`, `role` — joblog has no ACL of its own beyond admin access. + +## Gotchas + +- Joblog is append-only history from the API surface — there is no `update` verb, only `delete`. Trim by deleting individual rows. +- `script_result` bodies can be huge (megabytes) for long crawls; prefer `--output json | jq` and slice fields when scripting, instead of dumping the whole row to a terminal. +- Timestamps are epoch milliseconds in raw API responses; the CLI converts them for display, but if you consume `--output json` downstream you must convert yourself. +- `job_status` values are case-sensitive enum strings produced by the scheduler — filter on the exact spelling (`OK`, `FAIL`, `RUNNING`). +- A row in `RUNNING` state has no `end_time` yet — guard for `null` when computing durations. +- Pagination: `list` defaults to page 1, size 100. Walk pages explicitly to scan deep history; the API does not stream. + +## Examples + +Recent joblog rows (default page 1, size 100): + +```bash +fessctl joblog list +``` + +Inspect one job log entry by ID, including the captured `script_result`: + +```bash +fessctl joblog get AY7k0lJ4q9xQwExample +``` + +Find failing runs across the most recent 200 entries via JSON output and `jq`: + +```bash +fessctl joblog list --size 200 --output json \ + | jq '.response.logs[] | select(.job_status != "OK") | {id, job_name, job_status, start_time}' +``` + +## See also + +- fess-docs: en/15.6/admin/joblog-guide.rst +- workflows.md: Recipe 3 (failing crawl investigation) +- Related features: `references/features/scheduler.md`, `references/features/crawlinginfo.md` From 767416b0d44a3b205927214e43a6ac8e03dabb85 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 21:46:38 +0900 Subject: [PATCH 13/16] refactor(skill): remove .claude/skills/fessctl in favor of skills/fessctl The .claude/skills/ tree is the install location for third-party skills (see 'gh skill publish' warning). Authored skills belong under skills/ per the agentskills.io specification. The new skills/fessctl/ replaces the prior single-file skill with a SKILL.md + 6 cross-cutting refs + 22 per-feature refs structure (29 files total). --- .claude/skills/fessctl/SKILL.md | 172 -------------------------------- 1 file changed, 172 deletions(-) delete mode 100644 .claude/skills/fessctl/SKILL.md diff --git a/.claude/skills/fessctl/SKILL.md b/.claude/skills/fessctl/SKILL.md deleted file mode 100644 index b3d604f..0000000 --- a/.claude/skills/fessctl/SKILL.md +++ /dev/null @@ -1,172 +0,0 @@ ---- -name: fessctl -description: Manage Fess search engine via fessctl CLI. Use for CRUD operations on web configs, file configs, data configs, schedulers, users, roles, groups, and more. ---- - -## Overview - -fessctl is a CLI tool for managing Fess search engine via the admin API. It supports 22 resource types with standard CRUD operations. - -### Prerequisites - -- Python 3.13+ -- fessctl installed (`pip install fessctl` or `uv pip install fessctl`) -- Fess server running with API access enabled - -## Environment Setup - -```bash -export FESS_ENDPOINT="http://localhost:8080" # Fess server URL -export FESS_ACCESS_TOKEN="your-access-token" # API access token -export FESS_VERSION="v1" # API version (default: v1) -``` - -## Output Formats - -- `text` (default) - Markdown tables and structured output, AI-friendly -- `json` - Raw JSON from API -- `yaml` - YAML formatted output - -Use `-o json` for programmatic parsing, `-o text` for human/AI readable output. - -## Command Reference - -| Resource | Commands | Description | -| --- | --- | --- | -| accesstoken | create, update, delete, get, list | API access tokens | -| badword | create, update, delete, get, list | Bad words for suggest | -| boostdoc | create, update, delete, get, list | Document boost rules | -| crawlinginfo | delete, get, list | Crawling session info | -| dataconfig | create, update, delete, get, list | Data store configs | -| duplicatehost | create, update, delete, get, list | Duplicate host mappings | -| elevateword | create, update, delete, get, list | Promoted search words | -| fileauth | create, update, delete, get, list | File auth credentials | -| fileconfig | create, update, delete, get, list | File crawl configs | -| group | create, update, delete, get, getbyname, list | User groups | -| joblog | delete, get, list | Job execution logs | -| keymatch | create, update, delete, get, list | Key match rules | -| labeltype | create, update, delete, get, list | Label types | -| pathmap | create, update, delete, get, list | Path mappings | -| relatedcontent | create, update, delete, get, list | Related content | -| relatedquery | create, update, delete, get, list | Related queries | -| reqheader | create, update, delete, get, list | Request headers | -| role | create, update, delete, get, getbyname, list | User roles | -| scheduler | create, update, delete, get, list, start, stop | Job schedulers | -| user | create, update, delete, get, getbyname, list | Users | -| webauth | create, update, delete, get, list | Web auth credentials | -| webconfig | create, update, delete, get, list | Web crawl configs | - -## Common Workflows - -### Health Check - -```bash -fessctl ping -``` - -### Web Crawl Setup - -```bash -# 1. Create web config -fessctl webconfig create --name "My Site" --url "https://example.com" -o json - -# 2. Find the default crawler scheduler -fessctl scheduler list - -# 3. Start crawling -fessctl scheduler start - -# 4. Monitor job logs -fessctl joblog list -``` - -### User Management - -```bash -# Create role -fessctl role create "editor" - -# Create group -fessctl group create "team-a" - -# Create user with role and group -fessctl user create "john" "password123" --role "editor" --group "team-a" - -# Look up user by name -fessctl user getbyname "john" -``` - -### File Crawl Setup - -```bash -# Create file config -fessctl fileconfig create --name "Docs" --path "/data/docs" - -# Add file auth if needed -fessctl fileauth create --username "user" --file-config-id --password "pass" -``` - -## Response Structure - -- `response.status` == 0 means success -- `response.id` contains the resource ID on create -- `response.setting` contains single resource data (get) -- `response.settings` contains resource list (list) -- Exception: crawlinginfo and joblog use `response.log` / `response.logs` - -## Important Patterns - -- **Update**: internally does GET then PUT (merges existing data) -- **List pagination**: `--page` (default 1) and `--size` (default 100) -- **Permissions**: use `--permission "{role}guest"` format -- **Multi-value fields**: repeat the option (e.g., `--url "http://a" --url "http://b"`) - -## Complete Examples - -```bash -# Create a web config with multiple URLs and labels -fessctl webconfig create \ - --name "Corporate Site" \ - --url "https://www.example.com" \ - --url "https://blog.example.com" \ - --excluded-url "(?i).*(css|js|jpeg|jpg|gif|png)" \ - --depth 3 \ - --max-access-count 10000 \ - --num-of-thread 3 \ - --interval-time 5000 \ - -o json - -# Update a web config -fessctl webconfig update --name "Updated Name" --depth 5 - -# Delete a web config -fessctl webconfig delete - -# Get details -fessctl webconfig get - -# List with pagination -fessctl webconfig list --page 1 --size 50 - -# Create a scheduled job -fessctl scheduler create \ - --name "Nightly Crawl" \ - --target "all" \ - --script-type "groovy" \ - --cron-expression "0 0 2 * * ?" \ - --script-data "return container.getComponent(\"crawlJob\").execute();" - -# Create a boost rule -fessctl boostdoc create \ - --url-expr "https://important.example.com/.*" \ - --boost-expr "100" \ - --sort-order 1 - -# Create a key match -fessctl keymatch create \ - --term "FAQ" \ - --query "title:FAQ" \ - --max-size 3 \ - --boost 10.0 \ - --version-no 1 -``` From 7c44e4982d7531d966fe1072fef1b805a738b618 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 22:00:26 +0900 Subject: [PATCH 14/16] fix(skill): correct CLI usage in cross-cutting references Code-review surfaced systematic issues in the hand-written cross-cutting reference files (per-feature subagent-generated files were correct): - json output is the raw Fess admin API response (.response.settings[], .response.setting, .response.logs[]); rewrite all jq pipelines that assumed a {success,data} or bare-array shape. - Resource IDs are positional arguments. No fessctl subcommand accepts --id; remove every --id flag from examples and document the convention. - Recipe 1 (initial crawl): webauth has no --name; webauth requires --web-config-id, so order is webconfig -> webauth, not the reverse. Fix --included-urls -> --included-url. Use 'scheduler start ' instead of pointing at --help. - Recipe 3 (failure investigation): use snake_case API fields (job_name, job_status, script_result, session_id, etc.). - Recipe 4 (user provisioning): role/group/user create take name and password as positional arguments. The user flag is --role/--group (singular, repeatable), not --roles/--groups. - Recipe 5 (relevance tuning): boostdoc requires --sort-order; keymatch/elevateword/relatedquery require --version-no; relatedquery --queries takes a single newline-joined string, not a JSON array. - Authentication: drop the invented 'Radmin-api' permission; use the documented {role}/{group}/{user} permission syntax. Use --permission (singular), not --permissions. - conventions.md: add joblog (also read-only), getbyname (user/role/group), scheduler start/stop to the exceptions list. - installation.md: 'uv pip install -e src' -> '.' (pyproject is at root). - features/dataconfig.md: drop the bogus 'fessctl plugin install' tip; fessctl 0.1.0 does not wrap plugin admin. --- skills/fessctl/references/authentication.md | 13 +- skills/fessctl/references/conventions.md | 33 ++-- .../fessctl/references/features/dataconfig.md | 2 +- skills/fessctl/references/installation.md | 4 +- skills/fessctl/references/output-formats.md | 69 ++++++-- skills/fessctl/references/troubleshooting.md | 21 ++- skills/fessctl/references/workflows.md | 165 +++++++++++------- 7 files changed, 193 insertions(+), 114 deletions(-) diff --git a/skills/fessctl/references/authentication.md b/skills/fessctl/references/authentication.md index 8383e86..c25f30c 100644 --- a/skills/fessctl/references/authentication.md +++ b/skills/fessctl/references/authentication.md @@ -18,7 +18,7 @@ Defaults live in `src/fessctl/config/settings.py`. The defaults are conservative 1. Browse to `${FESS_ENDPOINT}/admin/` and sign in with an admin account. 2. Open **System → Access Token**. -3. Click **Create New**, give the token a name (e.g. `claude-cli`), and select the permissions it needs. For most fessctl operations the `Radmin-api` permission is required. +3. Click **Create New**, give the token a name (e.g. `claude-cli`), and pick a parameter name plus the permissions it should grant. Permissions are Fess permission strings of the form `{role}`, `{group}`, or `{user}` — for full administrative access bind the token to a role that already has admin rights, for example `{role}admin`. The exact syntax is documented in `fess-docs/en/15.6/admin/accesstoken-guide.rst`. 4. Save and copy the generated token value. It is shown only once. ### Via fessctl (after you already have one admin token) @@ -26,7 +26,7 @@ Defaults live in `src/fessctl/config/settings.py`. The defaults are conservative ```bash fessctl accesstoken create \ --name claude-cli \ - --permissions "Radmin-api" + --permission "{role}admin" # repeat --permission to grant more ``` See `references/features/accesstoken.md` for the full subcommand surface (list, get, update, delete). @@ -56,11 +56,12 @@ Never commit a token to git, and never include it in chat output, log files, or Fess access tokens are bearer tokens. The token's permissions are fixed at creation time; rotating permissions means issuing a new token and deleting the old one. Tokens do not auto-rotate. If a token leaks, delete it immediately via: ```bash -fessctl accesstoken list --output json | jq '.[] | select(.name=="claude-cli")' -fessctl accesstoken delete --id +fessctl accesstoken list --output json \ + | jq '.response.settings[] | select(.name=="claude-cli")' +fessctl accesstoken delete ``` -Most fessctl subcommands require admin-equivalent permission. A token issued for a non-admin user will succeed at `ping` and may succeed at some read-only `list` operations but will fail with `403 Forbidden` on `create`/`update`/`delete`. +Most fessctl subcommands require admin-equivalent permission. A token bound to a non-admin role will succeed at `ping` and may succeed at some read-only `list` operations but will fail with `403 Forbidden` on `create`/`update`/`delete`. ## Smoke test @@ -78,4 +79,4 @@ Expected: ## Common 401 / 403 causes -If `user list` fails after `ping` succeeds, the token is the problem. See `references/troubleshooting.md` for the full diagnostic flow; the short list is: token unset, token typo, token expired or revoked, or token issued without `Radmin-api` permission. +If `user list` fails after `ping` succeeds, the token is the problem. See `references/troubleshooting.md` for the full diagnostic flow; the short list is: token unset, token typo, token expired or revoked, or token issued for a role that lacks admin permissions. diff --git a/skills/fessctl/references/conventions.md b/skills/fessctl/references/conventions.md index 33355a5..4b76946 100644 --- a/skills/fessctl/references/conventions.md +++ b/skills/fessctl/references/conventions.md @@ -7,16 +7,18 @@ Patterns that hold across most fessctl subcommands. Each per-feature reference n Every resource type exposes a uniform set of verbs: ``` -fessctl list # paginated list -fessctl get # one resource by --id -fessctl create # new resource, fields via flags -fessctl update # mutate one resource by --id -fessctl delete # remove one resource by --id +fessctl list # paginated list +fessctl get # one resource by positional ID +fessctl create [flags] # new resource, fields via flags +fessctl update [flags] # mutate one resource by positional ID +fessctl delete # remove one resource by positional ID ``` Exceptions: -- **`crawlinginfo`** is effectively read-only history: it exposes only `list`, `get`, `delete`. There is no `create` or `update`. Verify with `grep '@crawlinginfo_app.command' src/fessctl/commands/crawlinginfo.py` if in doubt. +- **`crawlinginfo`** and **`joblog`** are effectively read-only history: they expose only `list`, `get`, `delete`. There is no `create` or `update`. Verify with `grep '@_app.command' src/fessctl/commands/.py` if in doubt. +- **`user`**, **`role`**, **`group`** add a `getbyname` verb — `fessctl user getbyname ` resolves the row by its human-readable name rather than its opaque ID. +- **`scheduler`** adds `start ` and `stop ` for triggering or halting a job manually. - The top-level `fessctl ping` is not part of any resource group; it is the smoke test for endpoint reachability. When in doubt, run `fessctl --help` to see the verbs offered for a particular resource, then `fessctl --help` for the flag list. @@ -25,8 +27,14 @@ When in doubt, run `fessctl --help` to see the verbs offered for a pa Most resources use opaque Fess-internal IDs (e.g. ULID-like strings). Treat them as opaque — they are not URLs, not human-readable, and may differ between environments after an export/import. -- `get`, `update`, `delete` always require `--id`. -- IDs are discovered via `list`. The `id` field is included in `list --output json` output. +- `get`, `update`, `delete` always take the resource ID as a **positional argument**, not a flag. For example: + ```bash + fessctl webconfig get + fessctl webconfig update --num-of-thread 3 + fessctl webconfig delete + ``` + No fessctl subcommand accepts `--id`; if you see one, that is a typo. +- IDs are discovered via `list`. With `--output json` the path is `.response.settings[].id` for most resources (`.response.logs[].id` for `joblog`/`crawlinginfo`). - Do not hard-code IDs across environments. After importing settings into a new Fess server, IDs will be regenerated. ## Pagination @@ -50,11 +58,12 @@ Fields are exposed as Typer options. Required fields are marked with `...` (Type ```bash # Look up first, decide to create or update -existing_id=$(fessctl webconfig list --output json | jq -r '.[] | select(.name=="foo") | .id') +existing_id=$(fessctl webconfig list --output json \ + | jq -r '.response.settings[] | select(.name=="foo") | .id') if [[ -z "$existing_id" ]]; then fessctl webconfig create --name foo ... else - fessctl webconfig update --id "$existing_id" ... + fessctl webconfig update "$existing_id" ... fi ``` @@ -66,8 +75,8 @@ fessctl does not provide native bulk verbs. Compose with shell: ```bash fessctl badword list --output json \ - | jq -r '.[].id' \ - | xargs -I{} fessctl badword delete --id {} + | jq -r '.response.settings[].id' \ + | xargs -I{} fessctl badword delete {} ``` When deleting in bulk, capture a backup first (`fessctl list --output yaml > before.yaml`) so the operation is reversible. diff --git a/skills/fessctl/references/features/dataconfig.md b/skills/fessctl/references/features/dataconfig.md index f24a8db..d8df0ef 100644 --- a/skills/fessctl/references/features/dataconfig.md +++ b/skills/fessctl/references/features/dataconfig.md @@ -53,7 +53,7 @@ Required on create: `name`, `handler_name`. Recommended in practice: `handler_pa - **labeltype** — assign labels through the script or via permissions to let users filter ingested documents in the UI. - **scheduler** — actual crawling is triggered by a scheduler job (typically the `Default Crawler` job, or a job whose script calls the data-store handler). Without an enabled scheduler job, a DataConfig is inert. -- **fess-ds-\*** plugin — non-built-in `handler_name` values require the corresponding plugin to be installed on the Fess server (see `fessctl plugin install`). The handler name in the config must match the plugin's registered name exactly. +- **fess-ds-\*** plugin — non-built-in `handler_name` values require the corresponding plugin to be installed on the Fess server. fessctl 0.1.0 does not wrap the plugin admin API; install plugins from the Fess admin UI (System → Plugin) or follow the plugin's own install docs. The handler name in the config must match the plugin's registered name exactly. - **JDBC driver / native libs** — `DatabaseDataStore` needs the matching JDBC `.jar` in `app/WEB-INF/lib` and a server restart to be picked up. - **virtualhost** — controls which virtual host the produced documents are visible under. diff --git a/skills/fessctl/references/installation.md b/skills/fessctl/references/installation.md index 2cdb03c..4944c17 100644 --- a/skills/fessctl/references/installation.md +++ b/skills/fessctl/references/installation.md @@ -37,12 +37,12 @@ pipx install fessctl fessctl --help ``` -`pipx` is preferred because it isolates fessctl in its own virtualenv. As of v0.1.0 the project is also installable from source: +`pipx` is preferred because it isolates fessctl in its own virtualenv. (Confirm the package is published to PyPI for the version you need; if not, fall back to a source install.) The project is also installable from source: ```bash git clone https://github.com/codelibs/fessctl.git cd fessctl -uv pip install -e src +uv pip install -e . ``` After either install, `command -v fessctl` should print a path on `$PATH` and the detection chain will pick this branch. diff --git a/skills/fessctl/references/output-formats.md b/skills/fessctl/references/output-formats.md index ac718e1..d8ae4ac 100644 --- a/skills/fessctl/references/output-formats.md +++ b/skills/fessctl/references/output-formats.md @@ -1,58 +1,91 @@ # Output formats -Every fessctl subcommand that returns data accepts `--output` (or `-o`) to choose between three serializations. +Every fessctl subcommand that returns data accepts `--output` (or `-o`) to choose between three serializations. The default value is `text` (markdown). ## Available formats | Format | Best for | Notes | |------------|----------|-------| -| `markdown` | Reading in chat or a terminal | Human-friendly tables and headings. **Do not parse it programmatically — column layout is presentational and may change between releases.** | -| `json` | Piping to `jq`, scripting | Stable, machine-readable. Top-level shape is `{"success": ..., "data": ...}` for action results and an array for `list`. Always start here when chaining commands. | -| `yaml` | Hand-editing, version control diffs | Useful for capturing settings to a file you intend to read or edit by eye. | +| `text` (default) | Reading in chat or a terminal | Markdown tables and headings, AI/human friendly. **Do not parse it programmatically — column layout is presentational and may change between releases.** | +| `json` | Piping to `jq`, scripting | Stable, machine-readable. Returns the **raw Fess admin API response** as-is — top-level shape is `{"response": { ... }}`. See "JSON shape" below for the keys you will actually navigate. | +| `yaml` | Hand-editing, version control diffs | Same content as `json`, just YAML-encoded. Useful for capturing settings to a file you intend to read or edit by eye. | -The exact set of supported values is implemented in `src/fessctl/utils.py`; if you see a format mentioned in `--help` that is not listed here, prefer `--help` as the source of truth. +The implementation lives in `src/fessctl/utils.py`; if `--help` lists a format not shown here, prefer `--help` as the source of truth. + +## JSON shape + +`fessctl --output json` echoes the upstream Fess admin API payload verbatim, so jq paths must navigate through `.response`. The relevant keys vary by verb: + +| Verb | jq path to the data | +|-------------------|-------------------------------------------| +| `list` | `.response.settings[]` (one per row) | +| `get` | `.response.setting` | +| `create` / `update` | `.response.id` (the new/affected ID), plus `.response.status` (0 = success) | +| `delete` | `.response.status` (0 = success) | +| `joblog list` | `.response.logs[]` | +| `joblog get` | `.response.log` | +| `crawlinginfo list` | `.response.logs[]` | +| `crawlinginfo get` | `.response.log` | + +Status codes other than 0 indicate an error; the message is at `.response.message`. ## When to use which -- **Asking Claude to summarize a list of resources** → `markdown` (concise to read). +- **Asking Claude to summarize a list of resources** → `text` (concise markdown). - **Filtering a list before acting** → `json | jq`. - **Saving config for review or backup** → `yaml`. -- **Anything that another shell command will consume** → `json` and never `markdown`. +- **Anything that another shell command will consume** → `json`, never `text`. + +## Identifying resources on the command line + +Resource IDs are passed as **positional arguments**, not as `--id`: + +```bash +fessctl webconfig get +fessctl badword delete +fessctl user get +``` + +Discover IDs with `list --output json | jq '.response.settings[].id'` (or `.logs[].id` for joblog/crawlinginfo). ## Idiomatic pipelines ```bash # 1. List then filter to high-boost crawl configs -fessctl webconfig list --output json | jq '.[] | select(.boost > 1.0)' +fessctl webconfig list --output json \ + | jq '.response.settings[] | select(.boost > 1.0)' # 2. Extract one field across many rows -fessctl user list --output json | jq -r '.[].name' +fessctl user list --output json \ + | jq -r '.response.settings[].name' -# 3. Capture config to YAML for review -fessctl webconfig get --id --output yaml > webconfig-snapshot.yaml +# 3. Capture one config to YAML for review +fessctl webconfig get --output yaml > webconfig-snapshot.yaml # 4. Save a list as a baseline for diffing later fessctl scheduler list --output json > scheduler-before.json # ... make changes ... fessctl scheduler list --output json > scheduler-after.json -diff <(jq -S . scheduler-before.json) <(jq -S . scheduler-after.json) +diff <(jq -S '.response.settings' scheduler-before.json) \ + <(jq -S '.response.settings' scheduler-after.json) # 5. Bulk delete by filter fessctl badword list --output json \ - | jq -r '.[] | select(.suggest_word | test("^_test")) | .id' \ - | xargs -I{} fessctl badword delete --id {} + | jq -r '.response.settings[] | select(.suggest_word | test("^_test")) | .id' \ + | xargs -I{} fessctl badword delete {} ``` -`fessctl` does not currently accept a `--from-file` style input flag (verified in `src/fessctl/commands/`). To re-create a resource from a captured YAML, read it back yourself and re-issue `create`/`update` with the appropriate flags — for example: +`fessctl` does not provide a `--from-file` import flag (verified in `src/fessctl/commands/`). To re-create a resource from a captured YAML, parse the file yourself and reissue `create`/`update` with explicit flags. The captured YAML's payload is at `.response.setting`: ```bash -yq '.data | "--name=" + .name + " --url=" + .urls' webconfig-snapshot.yaml +yq '.response.setting | "--name " + .name + " --url " + .urls' webconfig-snapshot.yaml +# Then feed those flags into a fresh `fessctl webconfig create ...` invocation. ``` -Then call `fessctl webconfig create` with those flags. If you need true round-trip restore, use multiple commands; do not assume a single-flag import path exists. +Round-trip restore requires composing several commands; do not assume a single-flag import path exists. ## Caveats -- `markdown` output is rendered for humans only. Scripts that grep markdown will break the next time the table layout shifts. +- `text` output is rendered for humans only. Scripts that grep markdown will break the next time the table layout shifts. - `json` field names track the Fess admin API directly, so they may differ subtly between Fess versions. Pin `FESS_VERSION` and rerun if you see unexpected keys. - `yaml` output preserves nesting from the underlying API but does not include comments. If you serialize and then re-import, expect identifier and timestamp fields to need stripping. diff --git a/skills/fessctl/references/troubleshooting.md b/skills/fessctl/references/troubleshooting.md index 6fbc8e1..9bae92b 100644 --- a/skills/fessctl/references/troubleshooting.md +++ b/skills/fessctl/references/troubleshooting.md @@ -11,7 +11,7 @@ HTTP 401: Unauthorized Causes: - `FESS_ACCESS_TOKEN` is unset, mistyped, or has been deleted in the admin UI. -- The token was issued for a non-admin user; most fessctl operations require admin (`Radmin-api`) permission. +- The token is bound to a role that does not have admin permissions; most fessctl operations require an admin-grade token (e.g. one bound to `{role}admin`). - You exported the token in one shell but are running fessctl from another shell where the env var was not inherited. Recovery: @@ -20,8 +20,8 @@ Recovery: echo "${FESS_ACCESS_TOKEN:-UNSET}" # 1. confirm the env var is exported fessctl ping # 2. ping does NOT need a token; if this works the endpoint is reachable fessctl user list --size 1 # 3. this DOES need a token -# If still 401: re-issue -fessctl accesstoken create --name claude-cli --permissions "Radmin-api" +# If still 401: re-issue with an admin-bound permission +fessctl accesstoken create --name claude-cli --permission "{role}admin" export FESS_ACCESS_TOKEN= ``` @@ -33,14 +33,15 @@ HTTP 404: Not Found Causes: -- The `--id` you passed does not exist in this Fess server (typo, copy-pasted from a different env, or the resource was deleted). +- The positional ID you passed does not exist in this Fess server (typo, copy-pasted from a different env, or the resource was deleted). - `FESS_ENDPOINT` points at the wrong server. - A previous `delete` ran successfully and the ID is now gone. Recovery: ```bash -fessctl list --output json | jq '.[] | {id,name}' +fessctl list --output json | jq '.response.settings[] | {id,name}' +# joblog / crawlinginfo use .response.logs[] instead ``` Find the live ID, retry. Resource IDs do not survive an export/import — see `references/conventions.md`. @@ -77,8 +78,11 @@ Cause: `FESS_VERSION` does not match the running server's major.minor. fessctl s Recovery: ```bash -curl -fsS "${FESS_ENDPOINT}/api/v1/health" | jq . # find the running Fess version -export FESS_VERSION= +# Confirm the endpoint is reachable; cluster_name and status come back here, +# but the Fess server version is not in this payload — get it from the +# admin UI footer or from the codelibs/fess release tag your operator deployed. +curl -fsS "${FESS_ENDPOINT}/api/v1/health" | jq . +export FESS_VERSION= ``` If the Fess server is newer than any version fessctl knows about, you may also need to update fessctl itself. @@ -108,7 +112,8 @@ Causes: Recovery: ```bash -fessctl list --page 1 --size 100 --output json | jq 'length' +fessctl list --page 1 --size 100 --output json \ + | jq '.response.settings | length' ``` If `length` is 0, the resource really is empty in this environment. Confirm you are pointed at the right `FESS_ENDPOINT`. diff --git a/skills/fessctl/references/workflows.md b/skills/fessctl/references/workflows.md index fd792b2..750f348 100644 --- a/skills/fessctl/references/workflows.md +++ b/skills/fessctl/references/workflows.md @@ -4,6 +4,12 @@ Each per-feature reference covers one resource. This file collects the small num All recipes assume `FESS_ENDPOINT`, `FESS_ACCESS_TOKEN`, and `FESS_VERSION` are exported. See `references/authentication.md`. +Conventions used below: + +- Resource IDs are **positional arguments**, not `--id` flags. +- `--output json` returns the raw Fess admin API payload — list rows live at `.response.settings[]` (or `.response.logs[]` for `joblog` / `crawlinginfo`); single rows at `.response.setting`. +- Several `create` commands require `--version-no` for optimistic-locking (use `1` for new rows). Confirm with `fessctl create --help`. + --- ## Recipe 1 — Stand up a fresh web crawl @@ -16,35 +22,45 @@ All recipes assume `FESS_ENDPOINT`, `FESS_ACCESS_TOKEN`, and `FESS_VERSION` are # 1. Smoke test fessctl ping -# 2. (Optional) Auth credentials for the target site -# Skip this step if the target is public. -fessctl webauth create --name corp-portal \ - --hostname portal.example.com \ - --username svc-crawler \ - --password '' - -# 3. (Optional) Label so search results can be filtered later -fessctl labeltype create --name engineering --value Engineering +# 2. (Optional) Label so search results can be filtered later +fessctl labeltype create \ + --name engineering \ + --value Engineering -# 4. The crawl config itself -fessctl webconfig create \ +# 3. The crawl config itself (must exist before webauth can bind to it) +WEB_CONFIG_ID=$(fessctl webconfig create \ --name corp-portal-docs \ --url "https://portal.example.com/docs/" \ - --included-urls "https://portal.example.com/docs/.*" \ - --boost 1.0 + --included-url "https://portal.example.com/docs/.*" \ + --boost 1.0 \ + --output json | jq -r '.response.id') +echo "Created webconfig $WEB_CONFIG_ID" -# 5. Find the scheduler job that runs web crawls +# 4. (Optional) Auth credentials for the target site — bind by webconfig ID. +# Skip this step if the target is public. +fessctl webauth create \ + --web-config-id "$WEB_CONFIG_ID" \ + --hostname portal.example.com \ + --username svc-crawler \ + --password '' \ + --protocol-scheme https \ + --auth-realm BASIC + +# 5. Find the scheduler job that runs web crawls (typically "Default Crawler" +# or any job whose script invokes the web crawler). Inspect, do not assume +# a single ID — your Fess install may have customized it. fessctl scheduler list --output json \ - | jq '.[] | select(.name | test("WebCrawler"; "i")) | {id,name,available}' + | jq '.response.settings[] | select(.name | test("Crawler"; "i")) | {id,name,available}' -# 6. Trigger that job (subcommand may vary; confirm with --help) -fessctl scheduler --help +# 6. Trigger the matching scheduler job (positional ID). +fessctl scheduler start -# 7. Confirm the crawl started and finished -fessctl joblog list --output json | jq '.[0]' +# 7. Confirm the crawl started and finished. +fessctl joblog list --output json \ + | jq '.response.logs[0]' ``` -**Verify:** the matching `joblog` entry transitions to a non-error terminal state, and `crawlinginfo list` shows a session for the new config. +**Verify:** the most-recent `joblog` entry transitions to a non-error terminal status, and `fessctl crawlinginfo list --output json | jq '.response.logs[0]'` shows a session created near the same timestamp. --- @@ -52,28 +68,34 @@ fessctl joblog list --output json | jq '.[0]' **Goal:** Move crawl/auth/label configuration from `fess-staging` to `fess-prod`. -**Prerequisites:** admin tokens for both environments; fields like IDs and timestamps will be regenerated on the destination. +**Prerequisites:** admin tokens for both environments. IDs and timestamps are regenerated on the destination, so any cross-resource references (e.g. `webauth.web_config_id`) must be remapped during re-create. ```bash -# 1. Export from source +# 1. Export from source. export FESS_ENDPOINT=https://fess-staging.example.com export FESS_ACCESS_TOKEN=$STAGING_TOKEN mkdir -p ./fess-export -for r in webauth fileauth labeltype webconfig fileconfig dataconfig pathmap; do +for r in labeltype webconfig fileconfig dataconfig pathmap webauth fileauth; do fessctl "$r" list --output yaml > "./fess-export/${r}.yaml" done -# 2. Switch to destination +# 2. Switch to destination. export FESS_ENDPOINT=https://fess-prod.example.com export FESS_ACCESS_TOKEN=$PROD_TOKEN -# 3. Re-create each resource on the destination (no native --from-file at v0.1.0; -# you must read each YAML and reissue create with explicit flags). For example: -yq '.data[] | "fessctl labeltype create --name=" + .name + " --value=" + .value' \ +# 3. Re-create each resource on the destination. fessctl 0.1.0 has no +# --from-file import flag, so each YAML must be parsed and replayed +# as explicit flags. Example for labeltype: +yq -r '.response.settings[] | + "fessctl labeltype create --name " + .name + " --value " + .value' \ ./fess-export/labeltype.yaml | sh + +# 4. Import in dependency order: labeltype/webauth/fileauth → webconfig/fileconfig/dataconfig. +# Map IDs as you go: capture the new webconfig ID returned by `create --output json` +# and use it for the matching webauth row. ``` -**Verify:** `fessctl list` on the destination matches what you expected. Diff the YAML exports if the migration is reversible. Re-creating dependents (webconfig depends on labeltype/webauth) requires that you reload IDs after the prerequisite resources land. +**Verify:** `fessctl list` on the destination matches the expected set. Diff the YAML exports if the migration is reversible (and after stripping `id`, `created_time`, `updated_time`, and any other server-populated fields). --- @@ -82,25 +104,24 @@ yq '.data[] | "fessctl labeltype create --name=" + .name + " --value=" + .value' **Goal:** Find out why a scheduled crawl is producing errors or no documents. ```bash -# 1. Which job is failing? +# 1. Which scheduler jobs are enabled? fessctl scheduler list --output json \ - | jq '.[] | select(.available == true)' \ - | head + | jq '.response.settings[] | select(.available == true) | {id,name,target}' -# 2. Most recent job runs and their statuses +# 2. Most recent job runs and their statuses (snake_case fields, see joblog.md). fessctl joblog list --output json \ - | jq '.[] | {id, jobName, jobStatus, scriptResult, startTime, endTime}' \ + | jq '.response.logs[] | {id, job_name, job_status, script_result, start_time, end_time}' \ | head -40 -# 3. For a specific failing run, fetch the full log -fessctl joblog get --id --output yaml +# 3. For a specific failing run, fetch the full log (positional ID). +fessctl joblog get --output yaml -# 4. Look at the crawl session metadata +# 4. Look at the crawl session metadata. fessctl crawlinginfo list --output json \ - | jq '.[] | {sessionId, name, createdTime, expiredTime}' \ + | jq '.response.logs[] | {session_id, name, value, created_time, expired_time}' \ | head -fessctl crawlinginfo get --id +fessctl crawlinginfo get ``` If the joblog points at HTTP errors, suspect the target site or `webauth`. If the joblog points at index errors, suspect Fess/OpenSearch health (out of skill scope). For environment-level errors (401, connection refused) see `references/troubleshooting.md`. @@ -109,28 +130,28 @@ If the joblog points at HTTP errors, suspect the target site or `webauth`. If th ## Recipe 4 — Provision a user with admin permissions -**Goal:** Create the role and group structure first, then attach a user. Order matters so the user references existing role/group IDs. +**Goal:** Create the role and group structure first, then attach a user. Order matters so the user references existing role/group names. ```bash -# 1. Role (or reuse an existing one) -fessctl role list --output json | jq -r '.[].name' -fessctl role create --name search-admin - -# 2. Group (optional; useful for organizing users) -fessctl group create --name platform-team - -# 3. The user, tied to the role and group -fessctl user create \ - --name alice \ - --password '' \ - --roles search-admin \ - --groups platform-team - -# 4. Verify -fessctl user list --output json | jq '.[] | select(.name=="alice")' +# 1. Role (or reuse an existing one). `name` is a positional argument. +fessctl role list --output json | jq -r '.response.settings[].name' +fessctl role create search-admin + +# 2. Group (optional; useful for organizing users). +fessctl group create platform-team + +# 3. The user. `name` and `password` are positional; --role and --group are +# singular flags that may be repeated. +fessctl user create alice '' \ + --role search-admin \ + --group platform-team + +# 4. Verify. +fessctl user list --output json \ + | jq '.response.settings[] | select(.name=="alice")' ``` -**Caution:** `delete` order is the reverse — drop users before deleting the role/group they reference, otherwise Fess keeps dangling references. +**Caution:** `delete` order is the reverse — drop users before deleting the role/group they reference, otherwise Fess keeps dangling permission strings. --- @@ -138,30 +159,40 @@ fessctl user list --output json | jq '.[] | select(.name=="alice")' **Goal:** Ensure that searches for "release notes" surface a curated document and demote noisy hits. +These `create` commands require fields that are easy to forget — `--version-no 1` for new rows, `--sort-order` for boostdoc, `--max-size` and `--boost` for keymatch. `relatedquery --queries` accepts a single newline-separated string (use `$'a\nb'` shell quoting). + ```bash -# 1. Boost any document whose URL matches a pattern +# 1. Boost any document whose URL matches a regex. fessctl boostdoc create \ - --url-expr 'url:"https://docs.example.com/release/*"' \ - --boost-expr '2.0' + --url-expr 'url:"https://docs.example.com/release/.*"' \ + --boost-expr '2.0' \ + --sort-order 1 -# 2. Pin a specific document for a specific query term +# 2. Pin a specific document for a specific query term. fessctl keymatch create \ --term 'release notes' \ --query 'url:"https://docs.example.com/release/latest"' \ --max-size 10 \ - --boost 100.0 + --boost 100.0 \ + --version-no 1 + +# 3. Promote a phrase in suggest output. +fessctl elevateword create \ + --suggest-word 'release notes' \ + --boost 100.0 \ + --version-no 1 -# 3. Suggest a related query when the user types a near-term -fessctl elevateword create --suggest-word "release notes" --boost 100.0 +# 4. Surface alternative queries when the user types the term. fessctl relatedquery create \ --term 'release notes' \ - --queries '["changelog","what is new"]' + --queries $'changelog\nwhat is new' \ + --version-no 1 -# 4. Filter out noise -fessctl badword create --suggest-word "internal-test" +# 5. Suppress noise. +fessctl badword create --suggest-word internal-test ``` -**Caution:** Several of these tunings take effect only after the suggest cache and/or search index are reloaded. If changes do not appear in search results, trigger the relevant scheduler job (see Recipe 1, step 5). +**Caution:** several of these tunings take effect only after the suggest dictionary is rebuilt or the search index is reloaded. If changes do not appear in search results, trigger the matching maintenance scheduler job (inspect `fessctl scheduler list --output json | jq '.response.settings[] | select(.name | test("Suggest|Index"; "i"))'`). --- From 0b586d82c1bcdbb9da4ec6e1eca8dd88be8dc23d Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 22:04:35 +0900 Subject: [PATCH 15/16] docs(skill): clarify accesstoken create CLI vs server-side requirements --- skills/fessctl/references/features/accesstoken.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fessctl/references/features/accesstoken.md b/skills/fessctl/references/features/accesstoken.md index a2f8f9f..4c4f6e4 100644 --- a/skills/fessctl/references/features/accesstoken.md +++ b/skills/fessctl/references/features/accesstoken.md @@ -44,7 +44,7 @@ Always reconfirm with `fessctl accesstoken --help`. } ``` -Required on create: `name` and at least one entry in `permissions` (passed as repeated `--permission` flags, joined with newlines by fessctl). `expires` is optional but recommended for non-service tokens; the CLI accepts `yyyy-MM-ddTHH:mm:ss`. `parameter_name` is optional and only meaningful for trusted internal embedding scenarios. `crud_mode` is set automatically (`1` for create, `2` for update). +Required by the CLI: `--name`. The Fess server typically also expects at least one `--permission` (passed as repeated flags, joined with newlines by fessctl) — without it the resulting token grants nothing useful, even though Typer accepts the call. `expires` is optional but recommended for non-service tokens; the CLI accepts `yyyy-MM-ddTHH:mm:ss`. `parameter_name` is optional and only meaningful for trusted internal embedding scenarios. `crud_mode` is set automatically (`1` for create, `2` for update). ## Relationships From 8ab78251dd96a3a264a14be6d3d0b144ad8879a4 Mon Sep 17 00:00:00 2001 From: Shinsuke Sugaya Date: Sun, 3 May 2026 22:19:26 +0900 Subject: [PATCH 16/16] fix(skill): drop fess-workspace-specific detection branch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The skill is published for general fessctl users, not just contributors working inside a fess-workspace clone. The 'detect FESS_WORKSPACE/repos/fessctl + uv run' branch assumes a layout that does not exist for installed users and pollutes the documentation with workspace-only paths. - SKILL.md detection chain reduced to PATH -> docker (2 branches). - installation.md drops Option B (fess-workspace dev mode) and consolidates PATH installs (pipx / uv tool / source) under Option A. Source install is the right path for anyone working from a local checkout — once installed with 'uv pip install -e .' it shows up on PATH and the chain picks it. - troubleshooting.md generalises the 'slow first invocation' note away from repos/fessctl-specific wording. --- skills/fessctl/SKILL.md | 5 +-- skills/fessctl/references/installation.md | 40 ++++++-------------- skills/fessctl/references/troubleshooting.md | 6 +-- 3 files changed, 17 insertions(+), 34 deletions(-) diff --git a/skills/fessctl/SKILL.md b/skills/fessctl/SKILL.md index ba03448..c55c003 100644 --- a/skills/fessctl/SKILL.md +++ b/skills/fessctl/SKILL.md @@ -11,9 +11,8 @@ version: 0.1.0 ## Detection (run in this order) -1. `command -v fessctl` → use it directly -2. `$FESS_WORKSPACE/repos/fessctl` exists AND `command -v uv` → `cd $FESS_WORKSPACE/repos/fessctl && uv run fessctl` -3. Fall back to `docker run --rm -e FESS_ENDPOINT -e FESS_ACCESS_TOKEN -e FESS_VERSION ghcr.io/codelibs/fessctl:` +1. `command -v fessctl` → use it directly (covers `pipx install fessctl`, `uv tool install fessctl`, manual `uv pip install -e .`, or any future package-manager install). +2. Fall back to `docker run --rm -e FESS_ENDPOINT -e FESS_ACCESS_TOKEN -e FESS_VERSION ghcr.io/codelibs/fessctl:`. See `references/installation.md` for the exact wrappers. diff --git a/skills/fessctl/references/installation.md b/skills/fessctl/references/installation.md index 4944c17..46d00a6 100644 --- a/skills/fessctl/references/installation.md +++ b/skills/fessctl/references/installation.md @@ -1,6 +1,6 @@ # Installing & Invoking fessctl -`fessctl` is delivered as a Python package and a published Docker image. The skill picks one of three runners depending on what is available locally. +`fessctl` is delivered as a Python package and a published Docker image. The skill picks one of two runners depending on what is available locally. ## Detection chain @@ -12,10 +12,6 @@ resolve_fessctl() { FESSCTL="fessctl" return fi - if [[ -d "${FESS_WORKSPACE:-$PWD}/repos/fessctl" ]] && command -v uv >/dev/null 2>&1; then - FESSCTL="uv --directory ${FESS_WORKSPACE:-$PWD}/repos/fessctl run fessctl" - return - fi FESSCTL="docker run --rm \ -e FESS_ENDPOINT -e FESS_ACCESS_TOKEN -e FESS_VERSION \ --add-host=host.docker.internal:host-gateway \ @@ -26,42 +22,30 @@ resolve_fessctl $FESSCTL ping ``` -The order matters: a system-PATH `fessctl` (e.g. installed via `pipx` or any future package manager) is the fastest invocation, followed by an in-tree `uv run` against `repos/fessctl`, with the published Docker image as the universal fallback. +The order matters: a system-PATH `fessctl` is the fastest invocation; Docker is the universal fallback that works on any machine with a Docker daemon. ## Option A — system PATH install -Recommended for end users who do not have a `fess-workspace` checkout. - -```bash -pipx install fessctl -fessctl --help -``` - -`pipx` is preferred because it isolates fessctl in its own virtualenv. (Confirm the package is published to PyPI for the version you need; if not, fall back to a source install.) The project is also installable from source: +Recommended for end users. Any install method that puts `fessctl` on `$PATH` is picked up by the detection chain automatically. ```bash +pipx install fessctl # if published to PyPI +# or +uv tool install fessctl # if published to PyPI +# or, from a local source checkout: git clone https://github.com/codelibs/fessctl.git cd fessctl uv pip install -e . +fessctl --help ``` -After either install, `command -v fessctl` should print a path on `$PATH` and the detection chain will pick this branch. - -## Option B — fess-workspace dev mode - -Use this when you are actively editing fessctl source inside a `fess-workspace` clone. Local edits are picked up on the next invocation. - -```bash -cd $FESS_WORKSPACE/repos/fessctl -uv sync -uv run fessctl --help -``` +`pipx` and `uv tool` are preferred because they isolate fessctl in its own virtualenv. Confirm the package is published to PyPI for the version you need; if not, the source install is the fallback. -`uv sync` only needs to run when `pyproject.toml` or `uv.lock` changes; subsequent calls reuse the cached environment. This branch is what the detection chain selects when `repos/fessctl` exists and `uv` is on `$PATH`. +After install, `command -v fessctl` should print a path on `$PATH` and the detection chain will pick this branch. -## Option C — Docker +## Option B — Docker -Use this when neither a PATH install nor a fess-workspace clone is available. +Use this when a PATH install is not possible or desired. ```bash docker run --rm \ diff --git a/skills/fessctl/references/troubleshooting.md b/skills/fessctl/references/troubleshooting.md index 9bae92b..2af33ed 100644 --- a/skills/fessctl/references/troubleshooting.md +++ b/skills/fessctl/references/troubleshooting.md @@ -118,11 +118,11 @@ fessctl list --page 1 --size 100 --output json \ If `length` is 0, the resource really is empty in this environment. Confirm you are pointed at the right `FESS_ENDPOINT`. -## `uv run fessctl` slow on the first invocation +## Slow first invocation -Cause: cold virtualenv build inside `repos/fessctl/.venv`. +Cause: when fessctl is installed from source, the first call may build the virtualenv on demand. -Recovery: run `uv sync` once up front in `repos/fessctl`. Subsequent `uv run fessctl ...` calls reuse the cached environment and start in well under a second. +Recovery: install with `pipx install fessctl`, `uv tool install fessctl`, or pre-warm a source install with `uv sync` / `uv pip install -e .` once up front. Subsequent calls start in well under a second. ## Docker pull or auth errors against `ghcr.io`