Skip to content

fix: wreq migration, gzip decompression fix, Parallel/You providers, reliability hardening#6

Open
Zireael wants to merge 20 commits into
paperfoot:masterfrom
Zireael:fix/rquest-to-wreq-migration
Open

fix: wreq migration, gzip decompression fix, Parallel/You providers, reliability hardening#6
Zireael wants to merge 20 commits into
paperfoot:masterfrom
Zireael:fix/rquest-to-wreq-migration

Conversation

@Zireael
Copy link
Copy Markdown

@Zireael Zireael commented Apr 23, 2026

Summary

This PR started as a migration from rquest (BoringSSL) to wreq (stable v5, OpenSSL) but grew into a comprehensive hardening, feature expansion, and critical decompression fix.

Critical Fixes

  • HTTP client migration: rquestwreq (resolves BoringSSL vs OpenSSL linking conflict on Linux — closes The rquest HTTP client has been renamed to wreq #4)
  • Gzip decompression fix: Brave, Serper, Exa, and xAI providers were silently failing because CDNs return gzip-compressed responses that simd_json::from_slice tried to parse as raw compressed bytes. Two-part fix:
    1. Added gzip feature to reqwest in Cargo.toml — enables automatic response decompression for ALL reqwest-based providers
    2. Added Accept-Encoding: gzip header to all affected endpoints — ensures deterministic CDN behavior

New Features

  • Parallel provider: Multi-provider fan-out search with result deduplication
  • You.com provider: Free-tier web search API integration

Reliability & Observability

  • Exponential retry (3×, 1–4s backoff), per-provider timeout, min_results enforcement
  • Structured providers_failed_detail taxonomy with cause/action/signature fields
  • Env-driven tracing subscriber with quiet default
  • Skipped provider tracking and reporting
  • sanitize_argv for safe CLI argument handling
  • P0-P3 code review findings addressed

Provider Gzip Safety Audit

All providers audited for gzip decompression safety:

Provider Pattern Status
brave .bytes() + simd_json ✅ Fixed (gzip feature + Accept-Encoding header)
serper .bytes() + simd_json ✅ Fixed (Accept-Encoding header)
exa .bytes() + simd_json ✅ Fixed (Accept-Encoding header)
xai .bytes() + simd_json ✅ Fixed (Accept-Encoding header)
tavily, perplexity, you, jina, serpapi, parallel, firecrawl .json() ✅ Safe (reqwest handles decompression)
stealth wreq (not reqwest) ✅ Safe (own Accept-Encoding header)
browserless .text() ✅ Safe (HTML responses, not JSON)

Testing

  • 36 unit tests passing
  • End-to-end verified: Brave provider returns 20 results with 0 json_errors
  • MCP tool integration verified working

Zireael added 7 commits April 19, 2026 07:34
…ability

Config, cache, timeout, and rejection-diagnostics hardening:

- config: type-numeric writes for settings.timeout/count (hbq1), legacy
  quoted-numeric coercion (hbq2)
- cache: skip caching all-provider-failed and degraded-empty responses (hbq3)
- engine: unified timeout budget from settings.timeout (hbq5), remove
  special-mode literals (hbq6), provider count clamping for Brave cap (hbq7)
- types/errors: structured providers_failed_detail taxonomy with cause/action/
  signature fields, backward-compatible (hbq4, hbq13, hbq14)
- providers: spawn_blocking extraction offload in stealth/browserless (hbq9),
  Exa NUM_RESULTS_EXCEEDED and Jina Cloudflare-1010/Browserless auth-mode
  rejection classification (hbq13)
- main/logging: env-driven tracing subscriber with quiet default, structured
  reliability events (hbq8)
- README: troubleshooting rejection diagnostics section (hbq15)
- clippy cleanup: unused vars, range pattern, test module ordering
- build: fix backon v1 retry callback (use .notify() on retry future)
The rquest HTTP client crate has been renamed to wreq, and the old
packages will be yanked. This commit migrates all references:

- Cargo.toml: rquest -> wreq v5, rquest-util -> wreq-util v2
- src/errors.rs: SearchError::Rquest -> SearchError::Wreq
- src/providers/stealth.rs: imports and types updated
- src/engine.rs: error variant match updated
- .github/workflows/release.yml: comment updated

Uses wreq v5.3.0 + wreq-util v2.2.6 (both stable), which provide
the same v5 API as rquest — purely a crate rename, no behavior change.

Closes paperfoot#4
Cherry-picked from andrey-golovko/search-cli fix/linux-build branch.
- Remove readability crate (pulled reqwest with native-tls/OpenSSL)
- Replace readability extraction with tl-based title + tag-stripping fallback
- Keep spawn_blocking offload for extraction from reliability hardening PR
- self_update: default-features = false to avoid native-tls
Cherry-picked from mouse-value-add/search-cli feat/you-search-provider.
- New You.com provider with general search and news search
- Freshness mapping, domain include/exclude filters
- Auth, API status, and rate-limit error handling
- Wired into engine routing, config, CLI, and docs
Bug fixes:
- browserless extract_text_simple now skips <script>/<style> content
- Extract/Scrape chain uses shared deadline to prevent timeout overflow
- stealth provider maps HTTP errors as SearchError::Api (not Config)
- finalize_response() wired into execute_search return path
- retry_request .when() now also matches SearchError::Wreq errors
- Cross-platform home_dir() resolves /home/zir on Unix, %USERPROFILE% on Windows
- Cache write failures now log warnings instead of silent ignore

DRY refactoring:
- Shared augment_query() extracted to providers/mod.rs (3 copies removed)
- Shared map_freshness() extracted to types.rs (2 copies removed)
- Shared extract_title() extracted to providers/mod.rs (2 copies removed)
- Shared epoch_days_to_date() extracted to utils.rs (2 copies removed)
- execute_special refactored with try_provider/try_provider_remaining helpers
  (~160 lines of boilerplate eliminated)

Cleanup:
- Removed unused Provider::timeout() trait method + 12 provider impls
- Removed build_providers() call from execute_special (avoids unused instances)

Enhancements:
- Cache file eviction on startup removes expired q_*.json files

Test coverage:
- 13 classify tests (social/news/academic/scholar/patents/people/extract/
  similar/images/places/general/priority + 12 SE-focused)
- 3 engine tests (normalize_url, provider_allowed)
- 4 browserless extract_text_simple tests (script/style skip)
- 5 you.com provider tests (JSON deserialization)
- 8 cache logic tests (should_cache_query_response, path determinism)
- 5 additional normalize_url edge cases

95 tests pass, 0 clippy warnings.
@Zireael Zireael changed the title fix: migrate rquest to wreq (stable v5) fix: migrate rquest to wreq + engineering review hardening (17 fixes) Apr 25, 2026
- Rename misleading variable name in test_failure_metadata_includes_api_reason_and_legacy_list
- Variable actually holds browserless entry but was named stealth_detail (copy-paste error)
- Also include other code review fixes from uncommitted changes
@Zireael Zireael changed the title fix: migrate rquest to wreq + engineering review hardening (17 fixes) fix: reliability hardening, dedup, and wreq migration Apr 28, 2026
Zireael and others added 5 commits April 30, 2026 00:38
- Add parallel to cli.rs PROVIDERS section (12->13 providers)
- Add parallel to -p/--providers help text
- Add parallel to config show configured providers list
- Move -q/--query from args to options in agent-info schema (P1-02)
Co-authored-by: Atlas <atlas@ohmyopencode.ai>
Add providers_skipped vector to distinguish between providers that are
not configured (skipped) versus those that errored during execution.
This improves error reporting clarity in search results by showing
which providers were skipped due to missing API keys.

- Add providers_skipped tracking in engine.rs for both regular and special searches
- Report brave/serper skip status in auto mode when API keys are missing
- Update errors.rs with new skip tracking in SearchResult
- P0: Fix providers_skipped declaration order in engine.rs
- P0: Add serde(default) to providers_skipped in types.rs
- P0: Add providers_skipped to minimal_response helper in cache.rs
- P0: Add serde(skip_serializing_if) to ErrorDetail fields in types.rs
- P0: Guard save_last/save_query with should_cache_query_response in main.rs
- P1: Remove timing assertions from cache tests (brittle on CI)
- P1: Move Tavily API key from POST body to Authorization header
- P1: Fix ErrorDetail fields to serialize as null (not omitted) for test compatibility
- P2: Fix normalize_url uppercase WWW. handling in engine.rs
- P2: Capture email in verify.rs error path for semaphore failures
- P2: Add server_error to retry_request predicate in providers/mod.rs
- P2: Add SSRF URL validation to providers/stealth.rs
- P2: Add domain injection protection to augment_query in providers/mod.rs
- P2: Redact API keys in error display and config output
- P3: Remove redundant sanitize_url_error unused function
- P3: Fix provider count in help output test (12->13)
- P3: Fix test_error_response_includes_actionable_rejection_fields

All 127 tests pass (91 unit + 36 integration).
…le README with codebase

- Add sanitize_argv() to strip JS null/undefined args before Clap parsing
- Add Parallel provider (api.parallel.ai) throughout README: headline,
  providers table, mode→provider mappings, env vars, quick start
- Document all CLI subcommands: providers, skill, config path, verify, update
- Document search flags: --freshness, --domain, --exclude-domain, --last
- Add Reliability section: retry (3x, 1-4s backoff), provider_timeout, min_results
- Add Caching section: 5-min TTL, failure exclusion, --last flag
- Document agent integration assets: SKILL.md and OpenCode tool schema
- Update Cargo.toml description: 12→13 providers, drop email verification
- Add .cargo/ to .gitignore
@Zireael Zireael changed the title fix: reliability hardening, dedup, and wreq migration feat: wreq migration, Parallel/You providers, reliability hardening, and full docs reconciliation May 9, 2026
Zireael added 3 commits May 9, 2026 11:29
…ovider

The Brave Search API requires gzip-compressed responses (per their docs),
but reqwest was compiled without the gzip feature, causing json_error
failures when Brave's CDN returned compressed responses that simd_json
could not parse.

- Add 'gzip' to reqwest features in Cargo.toml (enables automatic
  Accept-Encoding header sending and response decompression)
- Add explicit Accept-Encoding: gzip header to all 3 Brave endpoint
  request builders for deterministic CDN behavior
- Fixes search-cli-lpc.1 and search-cli-lpc.2
- Delete root SKILL.md, already moved to assets/.agents/skills/search-cli/SKILL.md
- Update include_str! path in src/cli.rs to match new location
These three providers use the same .bytes()+simd_json pattern as Brave
and are vulnerable to gzip-compressed responses being parsed as raw
bytes. Adding Accept-Encoding: gzip ensures deterministic CDN behavior
and pairs with the reqwest gzip feature (already enabled) for automatic
decompression.

Spike search-cli-lpc.3 findings:
- 3 vulnerable: serper, exa, xai (.bytes()+simd_json)
- 7 safe: tavily, perplexity, you, jina, serpapi, parallel, firecrawl (.json())
- stealth: safe (uses wreq, not reqwest, has its own Accept-Encoding)
@Zireael Zireael changed the title feat: wreq migration, Parallel/You providers, reliability hardening, and full docs reconciliation fix: wreq migration, gzip decompression fix, Parallel/You providers, reliability hardening May 9, 2026
Zireael added 4 commits May 9, 2026 14:54
… search-cli optimization program

- Add you.com web+news search provider (src/providers/you.rs, src/main.rs, src/types.rs)
- Integrate you into default Auto/General/News/Deep provider sets (src/engine.rs)
- Add config key YOU_API_KEY and CLI validation (src/config.rs, src/cli.rs)
- Add coding research skill: agent guidance for search tool usage with strategy tables,
  query playbook, wrapper contract reference, and refactor notes
- Add comprehensive improvement recommendations document (34→33 items across P0/P1/P2)
- Create beads implementation plan (epic + 4 child phase beads with dependency graph)
- Add AGENTS.md and CLAUDE.md project configuration
- search.ts: add current_date (YYYY-MM-DD) to all tool output paths
  (providers, error, and success responses) and to toolDebug block.
  Add AGENT RULES #11 telling agents to check current_date
  before shaping queries to avoid outdated-year searches.

- SKILL.md: add Hard rules paperfoot#5 (Check current_date) and a
  Query shaping rule warning against hard-coding years, with
  anti-pattern examples and correct pattern guidance.

Closes: search-cli-qs3, search-cli-2kc
…RLs in classifier

- Extend query_cache_path/save/load signatures with providers, domains,
  excludes, and freshness so different query configs do not collide
- Remove old guard that only cached default (no-provider) queries;
  always try cache with extended key instead
- Add URL detection (http/https/ftp) to classify intent as Extract
  before regex matching
- Add security regex patterns (CVE, advisory, exploit, etc.) for future use
…rrors

sanitize_brave_query() in brave.rs strips /path from site:domain/path
and prepends path segments as regular query terms. extractSiteOperators()
in search.ts extracts domain to -d flag for provider-agnostic defense.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The rquest HTTP client has been renamed to wreq

1 participant