fix: migrate rquest to wreq, add gzip decompression, reliability hardening, Parallel+You providers#1
Closed
Zireael wants to merge 16 commits into
Closed
fix: migrate rquest to wreq, add gzip decompression, reliability hardening, Parallel+You providers#1Zireael wants to merge 16 commits into
Zireael wants to merge 16 commits into
Conversation
…ability Config, cache, timeout, and rejection-diagnostics hardening: - config: type-numeric writes for settings.timeout/count (hbq1), legacy quoted-numeric coercion (hbq2) - cache: skip caching all-provider-failed and degraded-empty responses (hbq3) - engine: unified timeout budget from settings.timeout (hbq5), remove special-mode literals (hbq6), provider count clamping for Brave cap (hbq7) - types/errors: structured providers_failed_detail taxonomy with cause/action/ signature fields, backward-compatible (hbq4, hbq13, hbq14) - providers: spawn_blocking extraction offload in stealth/browserless (hbq9), Exa NUM_RESULTS_EXCEEDED and Jina Cloudflare-1010/Browserless auth-mode rejection classification (hbq13) - main/logging: env-driven tracing subscriber with quiet default, structured reliability events (hbq8) - README: troubleshooting rejection diagnostics section (hbq15) - clippy cleanup: unused vars, range pattern, test module ordering - build: fix backon v1 retry callback (use .notify() on retry future)
The rquest HTTP client crate has been renamed to wreq, and the old packages will be yanked. This commit migrates all references: - Cargo.toml: rquest -> wreq v5, rquest-util -> wreq-util v2 - src/errors.rs: SearchError::Rquest -> SearchError::Wreq - src/providers/stealth.rs: imports and types updated - src/engine.rs: error variant match updated - .github/workflows/release.yml: comment updated Uses wreq v5.3.0 + wreq-util v2.2.6 (both stable), which provide the same v5 API as rquest — purely a crate rename, no behavior change. Closes paperfoot#4
Cherry-picked from andrey-golovko/search-cli fix/linux-build branch. - Remove readability crate (pulled reqwest with native-tls/OpenSSL) - Replace readability extraction with tl-based title + tag-stripping fallback - Keep spawn_blocking offload for extraction from reliability hardening PR - self_update: default-features = false to avoid native-tls
Cherry-picked from mouse-value-add/search-cli feat/you-search-provider. - New You.com provider with general search and news search - Freshness mapping, domain include/exclude filters - Auth, API status, and rate-limit error handling - Wired into engine routing, config, CLI, and docs
Bug fixes: - browserless extract_text_simple now skips <script>/<style> content - Extract/Scrape chain uses shared deadline to prevent timeout overflow - stealth provider maps HTTP errors as SearchError::Api (not Config) - finalize_response() wired into execute_search return path - retry_request .when() now also matches SearchError::Wreq errors - Cross-platform home_dir() resolves /home/zir on Unix, %USERPROFILE% on Windows - Cache write failures now log warnings instead of silent ignore DRY refactoring: - Shared augment_query() extracted to providers/mod.rs (3 copies removed) - Shared map_freshness() extracted to types.rs (2 copies removed) - Shared extract_title() extracted to providers/mod.rs (2 copies removed) - Shared epoch_days_to_date() extracted to utils.rs (2 copies removed) - execute_special refactored with try_provider/try_provider_remaining helpers (~160 lines of boilerplate eliminated) Cleanup: - Removed unused Provider::timeout() trait method + 12 provider impls - Removed build_providers() call from execute_special (avoids unused instances) Enhancements: - Cache file eviction on startup removes expired q_*.json files Test coverage: - 13 classify tests (social/news/academic/scholar/patents/people/extract/ similar/images/places/general/priority + 12 SE-focused) - 3 engine tests (normalize_url, provider_allowed) - 4 browserless extract_text_simple tests (script/style skip) - 5 you.com provider tests (JSON deserialization) - 8 cache logic tests (should_cache_query_response, path determinism) - 5 additional normalize_url edge cases 95 tests pass, 0 clippy warnings.
- Rename misleading variable name in test_failure_metadata_includes_api_reason_and_legacy_list - Variable actually holds browserless entry but was named stealth_detail (copy-paste error) - Also include other code review fixes from uncommitted changes
- Add parallel to cli.rs PROVIDERS section (12->13 providers) - Add parallel to -p/--providers help text - Add parallel to config show configured providers list - Move -q/--query from args to options in agent-info schema (P1-02)
Co-authored-by: Atlas <atlas@ohmyopencode.ai>
Add providers_skipped vector to distinguish between providers that are not configured (skipped) versus those that errored during execution. This improves error reporting clarity in search results by showing which providers were skipped due to missing API keys. - Add providers_skipped tracking in engine.rs for both regular and special searches - Report brave/serper skip status in auto mode when API keys are missing - Update errors.rs with new skip tracking in SearchResult
- P0: Fix providers_skipped declaration order in engine.rs - P0: Add serde(default) to providers_skipped in types.rs - P0: Add providers_skipped to minimal_response helper in cache.rs - P0: Add serde(skip_serializing_if) to ErrorDetail fields in types.rs - P0: Guard save_last/save_query with should_cache_query_response in main.rs - P1: Remove timing assertions from cache tests (brittle on CI) - P1: Move Tavily API key from POST body to Authorization header - P1: Fix ErrorDetail fields to serialize as null (not omitted) for test compatibility - P2: Fix normalize_url uppercase WWW. handling in engine.rs - P2: Capture email in verify.rs error path for semaphore failures - P2: Add server_error to retry_request predicate in providers/mod.rs - P2: Add SSRF URL validation to providers/stealth.rs - P2: Add domain injection protection to augment_query in providers/mod.rs - P2: Redact API keys in error display and config output - P3: Remove redundant sanitize_url_error unused function - P3: Fix provider count in help output test (12->13) - P3: Fix test_error_response_includes_actionable_rejection_fields All 127 tests pass (91 unit + 36 integration).
…le README with codebase - Add sanitize_argv() to strip JS null/undefined args before Clap parsing - Add Parallel provider (api.parallel.ai) throughout README: headline, providers table, mode→provider mappings, env vars, quick start - Document all CLI subcommands: providers, skill, config path, verify, update - Document search flags: --freshness, --domain, --exclude-domain, --last - Add Reliability section: retry (3x, 1-4s backoff), provider_timeout, min_results - Add Caching section: 5-min TTL, failure exclusion, --last flag - Document agent integration assets: SKILL.md and OpenCode tool schema - Update Cargo.toml description: 12→13 providers, drop email verification - Add .cargo/ to .gitignore
…ovider The Brave Search API requires gzip-compressed responses (per their docs), but reqwest was compiled without the gzip feature, causing json_error failures when Brave's CDN returned compressed responses that simd_json could not parse. - Add 'gzip' to reqwest features in Cargo.toml (enables automatic Accept-Encoding header sending and response decompression) - Add explicit Accept-Encoding: gzip header to all 3 Brave endpoint request builders for deterministic CDN behavior - Fixes search-cli-lpc.1 and search-cli-lpc.2
- Delete root SKILL.md, already moved to assets/.agents/skills/search-cli/SKILL.md - Update include_str! path in src/cli.rs to match new location
These three providers use the same .bytes()+simd_json pattern as Brave and are vulnerable to gzip-compressed responses being parsed as raw bytes. Adding Accept-Encoding: gzip ensures deterministic CDN behavior and pairs with the reqwest gzip feature (already enabled) for automatic decompression. Spike search-cli-lpc.3 findings: - 3 vulnerable: serper, exa, xai (.bytes()+simd_json) - 7 safe: tavily, perplexity, you, jina, serpapi, parallel, firecrawl (.json()) - stealth: safe (uses wreq, not reqwest, has its own Accept-Encoding)
Owner
Author
|
Superseded by upstream PR paperfoot#6 � this fork-internal PR is redundant since the real PR targets the original repository. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major stability and feature release addressing HTTP client migration, gzip decompression failures, reliability hardening, and two new search providers.
Critical Fixes
rquestHTTP client has been renamed towreqpaperfoot/search-cli#4.gzipfeature +Accept-Encoding: gzipheaders: Brave, Serper, Exa, and xAI providers were failing because CDNs return gzip-compressed responses thatsimd_json::from_slicetried to parse as raw bytes. The gzip feature enables automatic decompression; the explicit header ensures deterministic CDN behavior.New Features
Reliability & Observability
sanitize_argvfor safe CLI argument handlingProvider Gzip Safety Audit
.bytes()+simd_json.bytes()+simd_json.bytes()+simd_json.bytes()+simd_json.json().text()Testing
Commits (16)
b61de95fix: migrate rquest to wreq (stable v5) — closes TherquestHTTP client has been renamed towreqpaperfoot/search-cli#4378fbf8fix: resolve BoringSSL vs OpenSSL linking conflict (PR Fix Linux/WSL build: resolve BoringSSL vs OpenSSL linking conflict paperfoot/search-cli#2)659b00bfeat: add you.com search provider (PR feat: add you.com search provider paperfoot/search-cli#3)6e4034ffeat: reliability hardening — timeout, cache, diagnostics, and observability8bcea3bchore: remove .beads and docs directories from PRf9245edfix: engineering review — 17 reliability, dedup, and coverage fixes655f7b4fix: rename stealth_detail to browserless_detail in test6026d4cfix: P1-01 add missing parallel provider to help text and config showf33b4e0test: add unit tests for helper functions and regression tests4f56cf9fix: track and report skipped providers separately from failed onesca2a393fix: address P0-P3 code review findings from ce-code-review6de0768feat: add Parallel provider, sanitize_argv, agent assets, and reconcile README25f26edfix: add reqwest gzip feature and Accept-Encoding header for Brave provider3cc16e4refactor: move SKILL.md to assets dir and update include pathd92fc0bfix: add Accept-Encoding gzip header to serper, exa, xai providers