fix: wreq migration, gzip decompression fix, Parallel/You providers, reliability hardening#6
Open
Zireael wants to merge 20 commits into
Open
fix: wreq migration, gzip decompression fix, Parallel/You providers, reliability hardening#6Zireael wants to merge 20 commits into
Zireael wants to merge 20 commits into
Conversation
…ability Config, cache, timeout, and rejection-diagnostics hardening: - config: type-numeric writes for settings.timeout/count (hbq1), legacy quoted-numeric coercion (hbq2) - cache: skip caching all-provider-failed and degraded-empty responses (hbq3) - engine: unified timeout budget from settings.timeout (hbq5), remove special-mode literals (hbq6), provider count clamping for Brave cap (hbq7) - types/errors: structured providers_failed_detail taxonomy with cause/action/ signature fields, backward-compatible (hbq4, hbq13, hbq14) - providers: spawn_blocking extraction offload in stealth/browserless (hbq9), Exa NUM_RESULTS_EXCEEDED and Jina Cloudflare-1010/Browserless auth-mode rejection classification (hbq13) - main/logging: env-driven tracing subscriber with quiet default, structured reliability events (hbq8) - README: troubleshooting rejection diagnostics section (hbq15) - clippy cleanup: unused vars, range pattern, test module ordering - build: fix backon v1 retry callback (use .notify() on retry future)
The rquest HTTP client crate has been renamed to wreq, and the old packages will be yanked. This commit migrates all references: - Cargo.toml: rquest -> wreq v5, rquest-util -> wreq-util v2 - src/errors.rs: SearchError::Rquest -> SearchError::Wreq - src/providers/stealth.rs: imports and types updated - src/engine.rs: error variant match updated - .github/workflows/release.yml: comment updated Uses wreq v5.3.0 + wreq-util v2.2.6 (both stable), which provide the same v5 API as rquest — purely a crate rename, no behavior change. Closes paperfoot#4
Cherry-picked from andrey-golovko/search-cli fix/linux-build branch. - Remove readability crate (pulled reqwest with native-tls/OpenSSL) - Replace readability extraction with tl-based title + tag-stripping fallback - Keep spawn_blocking offload for extraction from reliability hardening PR - self_update: default-features = false to avoid native-tls
Cherry-picked from mouse-value-add/search-cli feat/you-search-provider. - New You.com provider with general search and news search - Freshness mapping, domain include/exclude filters - Auth, API status, and rate-limit error handling - Wired into engine routing, config, CLI, and docs
Bug fixes: - browserless extract_text_simple now skips <script>/<style> content - Extract/Scrape chain uses shared deadline to prevent timeout overflow - stealth provider maps HTTP errors as SearchError::Api (not Config) - finalize_response() wired into execute_search return path - retry_request .when() now also matches SearchError::Wreq errors - Cross-platform home_dir() resolves /home/zir on Unix, %USERPROFILE% on Windows - Cache write failures now log warnings instead of silent ignore DRY refactoring: - Shared augment_query() extracted to providers/mod.rs (3 copies removed) - Shared map_freshness() extracted to types.rs (2 copies removed) - Shared extract_title() extracted to providers/mod.rs (2 copies removed) - Shared epoch_days_to_date() extracted to utils.rs (2 copies removed) - execute_special refactored with try_provider/try_provider_remaining helpers (~160 lines of boilerplate eliminated) Cleanup: - Removed unused Provider::timeout() trait method + 12 provider impls - Removed build_providers() call from execute_special (avoids unused instances) Enhancements: - Cache file eviction on startup removes expired q_*.json files Test coverage: - 13 classify tests (social/news/academic/scholar/patents/people/extract/ similar/images/places/general/priority + 12 SE-focused) - 3 engine tests (normalize_url, provider_allowed) - 4 browserless extract_text_simple tests (script/style skip) - 5 you.com provider tests (JSON deserialization) - 8 cache logic tests (should_cache_query_response, path determinism) - 5 additional normalize_url edge cases 95 tests pass, 0 clippy warnings.
- Rename misleading variable name in test_failure_metadata_includes_api_reason_and_legacy_list - Variable actually holds browserless entry but was named stealth_detail (copy-paste error) - Also include other code review fixes from uncommitted changes
- Add parallel to cli.rs PROVIDERS section (12->13 providers) - Add parallel to -p/--providers help text - Add parallel to config show configured providers list - Move -q/--query from args to options in agent-info schema (P1-02)
Co-authored-by: Atlas <atlas@ohmyopencode.ai>
Add providers_skipped vector to distinguish between providers that are not configured (skipped) versus those that errored during execution. This improves error reporting clarity in search results by showing which providers were skipped due to missing API keys. - Add providers_skipped tracking in engine.rs for both regular and special searches - Report brave/serper skip status in auto mode when API keys are missing - Update errors.rs with new skip tracking in SearchResult
- P0: Fix providers_skipped declaration order in engine.rs - P0: Add serde(default) to providers_skipped in types.rs - P0: Add providers_skipped to minimal_response helper in cache.rs - P0: Add serde(skip_serializing_if) to ErrorDetail fields in types.rs - P0: Guard save_last/save_query with should_cache_query_response in main.rs - P1: Remove timing assertions from cache tests (brittle on CI) - P1: Move Tavily API key from POST body to Authorization header - P1: Fix ErrorDetail fields to serialize as null (not omitted) for test compatibility - P2: Fix normalize_url uppercase WWW. handling in engine.rs - P2: Capture email in verify.rs error path for semaphore failures - P2: Add server_error to retry_request predicate in providers/mod.rs - P2: Add SSRF URL validation to providers/stealth.rs - P2: Add domain injection protection to augment_query in providers/mod.rs - P2: Redact API keys in error display and config output - P3: Remove redundant sanitize_url_error unused function - P3: Fix provider count in help output test (12->13) - P3: Fix test_error_response_includes_actionable_rejection_fields All 127 tests pass (91 unit + 36 integration).
…le README with codebase - Add sanitize_argv() to strip JS null/undefined args before Clap parsing - Add Parallel provider (api.parallel.ai) throughout README: headline, providers table, mode→provider mappings, env vars, quick start - Document all CLI subcommands: providers, skill, config path, verify, update - Document search flags: --freshness, --domain, --exclude-domain, --last - Add Reliability section: retry (3x, 1-4s backoff), provider_timeout, min_results - Add Caching section: 5-min TTL, failure exclusion, --last flag - Document agent integration assets: SKILL.md and OpenCode tool schema - Update Cargo.toml description: 12→13 providers, drop email verification - Add .cargo/ to .gitignore
…ovider The Brave Search API requires gzip-compressed responses (per their docs), but reqwest was compiled without the gzip feature, causing json_error failures when Brave's CDN returned compressed responses that simd_json could not parse. - Add 'gzip' to reqwest features in Cargo.toml (enables automatic Accept-Encoding header sending and response decompression) - Add explicit Accept-Encoding: gzip header to all 3 Brave endpoint request builders for deterministic CDN behavior - Fixes search-cli-lpc.1 and search-cli-lpc.2
- Delete root SKILL.md, already moved to assets/.agents/skills/search-cli/SKILL.md - Update include_str! path in src/cli.rs to match new location
These three providers use the same .bytes()+simd_json pattern as Brave and are vulnerable to gzip-compressed responses being parsed as raw bytes. Adding Accept-Encoding: gzip ensures deterministic CDN behavior and pairs with the reqwest gzip feature (already enabled) for automatic decompression. Spike search-cli-lpc.3 findings: - 3 vulnerable: serper, exa, xai (.bytes()+simd_json) - 7 safe: tavily, perplexity, you, jina, serpapi, parallel, firecrawl (.json()) - stealth: safe (uses wreq, not reqwest, has its own Accept-Encoding)
… search-cli optimization program - Add you.com web+news search provider (src/providers/you.rs, src/main.rs, src/types.rs) - Integrate you into default Auto/General/News/Deep provider sets (src/engine.rs) - Add config key YOU_API_KEY and CLI validation (src/config.rs, src/cli.rs) - Add coding research skill: agent guidance for search tool usage with strategy tables, query playbook, wrapper contract reference, and refactor notes - Add comprehensive improvement recommendations document (34→33 items across P0/P1/P2) - Create beads implementation plan (epic + 4 child phase beads with dependency graph) - Add AGENTS.md and CLAUDE.md project configuration
- search.ts: add current_date (YYYY-MM-DD) to all tool output paths (providers, error, and success responses) and to toolDebug block. Add AGENT RULES #11 telling agents to check current_date before shaping queries to avoid outdated-year searches. - SKILL.md: add Hard rules paperfoot#5 (Check current_date) and a Query shaping rule warning against hard-coding years, with anti-pattern examples and correct pattern guidance. Closes: search-cli-qs3, search-cli-2kc
…RLs in classifier - Extend query_cache_path/save/load signatures with providers, domains, excludes, and freshness so different query configs do not collide - Remove old guard that only cached default (no-provider) queries; always try cache with extended key instead - Add URL detection (http/https/ftp) to classify intent as Extract before regex matching - Add security regex patterns (CVE, advisory, exploit, etc.) for future use
…rrors sanitize_brave_query() in brave.rs strips /path from site:domain/path and prepends path segments as regular query terms. extractSiteOperators() in search.ts extracts domain to -d flag for provider-agnostic defense.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR started as a migration from
rquest(BoringSSL) towreq(stable v5, OpenSSL) but grew into a comprehensive hardening, feature expansion, and critical decompression fix.Critical Fixes
rquest→wreq(resolves BoringSSL vs OpenSSL linking conflict on Linux — closes TherquestHTTP client has been renamed towreq#4)simd_json::from_slicetried to parse as raw compressed bytes. Two-part fix:gzipfeature toreqwestinCargo.toml— enables automatic response decompression for ALL reqwest-based providersAccept-Encoding: gzipheader to all affected endpoints — ensures deterministic CDN behaviorNew Features
Reliability & Observability
min_resultsenforcementproviders_failed_detailtaxonomy with cause/action/signature fieldssanitize_argvfor safe CLI argument handlingProvider Gzip Safety Audit
All providers audited for gzip decompression safety:
.bytes()+simd_json.bytes()+simd_json.bytes()+simd_json.bytes()+simd_json.json().text()Testing