UN-3344 [FIX] Activate litellm retry for all LLM providers#1867
UN-3344 [FIX] Activate litellm retry for all LLM providers#1867chandrasekharan-zipstack wants to merge 2 commits intomainfrom
Conversation
litellm's wrapper-level retry (completion_with_retries) works for all providers including httpx-based ones (Anthropic, Vertex, Bedrock, Mistral, Azure AI Foundry), but only activates when num_retries is set in kwargs. Our adapters pass max_retries (from user UI config) which only works for SDK-based providers (OpenAI, Azure). httpx-based providers silently ignored it, resulting in zero retries on transient errors (500, 502, 503). Bridge the gap by copying the user's max_retries value into num_retries and setting retry_strategy to exponential_backoff_retry before calling litellm.completion(). litellm internally zeroes max_retries during wrapper retries to prevent double-retry with SDK providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughAdded a static method Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes silent retry failures for httpx-based LLM providers (Anthropic, Vertex AI, Bedrock, Mistral) by bridging the user-configured Key changes:
Minor issues noted:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant LLM as LLM.complete() / stream_complete() / acomplete()
participant Helper as _set_litellm_retry_params()
participant LiteLLM as litellm (completion_with_retries)
participant Provider as LLM Provider (Anthropic / OpenAI / Vertex / etc.)
Caller->>LLM: complete(prompt, **kwargs)
LLM->>LLM: adapter.validate({**self.kwargs, **kwargs})<br/>→ completion_kwargs (includes max_retries=N)
LLM->>Helper: _set_litellm_retry_params(completion_kwargs)
alt max_retries is truthy (N > 0)
Helper->>Helper: num_retries = N
Helper->>Helper: max_retries = 0
Helper->>Helper: retry_strategy = "exponential_backoff_retry"
else max_retries is falsy / not set
Helper->>Helper: no-op
end
LLM->>LiteLLM: litellm.completion(messages, num_retries=N, max_retries=0, retry_strategy=...)
loop Wrapper retry (up to num_retries times, exponential backoff: 1s→2s→4s→8s→10s)
LiteLLM->>Provider: HTTP request (max_retries=0 → no SDK-level retry)
alt Transient error (5xx, timeout)
Provider-->>LiteLLM: Error
LiteLLM->>LiteLLM: backoff, increment attempt
else Success
Provider-->>LiteLLM: Response
LiteLLM-->>LLM: ModelResponse
end
end
LLM-->>Caller: dict with response / LLMError on exhaustion
Prompt To Fix All With AIThis is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/llm.py
Line: 474-478
Comment:
**Missing observability log for retry activation**
There's no log emitted when wrapper-level retries are activated. In production, operators troubleshooting retry behaviour (e.g. verifying the fix works as expected, or diagnosing slow completions caused by repeated backoff) would have no SDK-level trace of the configured `num_retries`. A debug log here would make it far easier to confirm the feature is engaged:
```suggestion
max_retries = completion_kwargs.get("max_retries")
if max_retries:
logger.debug(
"Activating litellm wrapper-level retries: num_retries=%s, "
"strategy=exponential_backoff_retry",
max_retries,
)
completion_kwargs["num_retries"] = max_retries
completion_kwargs["max_retries"] = 0
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/llm.py
Line: 460-461
Comment:
**Loose type annotation on `completion_kwargs`**
The parameter is typed as a bare `dict` with no generic. This makes it harder for type checkers and readers to understand what keys/values are expected. Consider narrowing it to at least `dict[str, Any]` (or a `TypedDict` if one exists for completion kwargs) so callers get appropriate type-checking at the boundary:
```suggestion
def _set_litellm_retry_params(completion_kwargs: dict[str, Any]) -> None:
```
This would also require adding `from typing import Any` to the imports if it isn't already present.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "[FIX] Zero out max_r..." |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 473-476: The current truthy check for max_retries skips explicit
zero and permits negatives; change the branch to test "max_retries is not None",
validate that max_retries is an integer and >= 0 (or raise a ValueError for
invalid values), then set completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07e1c53a-ce05-4ead-9621-c256d668fe67
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/llm.py
| max_retries = completion_kwargs.get("max_retries") | ||
| if max_retries: | ||
| completion_kwargs["num_retries"] = max_retries | ||
| completion_kwargs["retry_strategy"] = "exponential_backoff_retry" |
There was a problem hiding this comment.
Handle zero and invalid retry counts explicitly.
At Line 474, using a truthy check skips an explicit max_retries=0 and allows negative values through. Please branch on is not None and validate bounds before copying.
Suggested patch
max_retries = completion_kwargs.get("max_retries")
- if max_retries:
- completion_kwargs["num_retries"] = max_retries
- completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
+ if max_retries is None:
+ return
+ if not isinstance(max_retries, int) or max_retries < 0:
+ raise SdkError("Invalid max_retries: expected a non-negative integer")
+ completion_kwargs["num_retries"] = max_retries
+ if max_retries > 0:
+ completion_kwargs["retry_strategy"] = "exponential_backoff_retry"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 473 - 476, The current
truthy check for max_retries skips explicit zero and permits negatives; change
the branch to test "max_retries is not None", validate that max_retries is an
integer and >= 0 (or raise a ValueError for invalid values), then set
completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.
SDK-based providers (OpenAI, Azure) default to max_retries=2 internally even when not explicitly set. Without zeroing it, the first attempt exhausts SDK retries before the wrapper retry kicks in, multiplying total attempts (e.g. 5 SDK + 5 wrapper = 11 instead of expected 5). Setting max_retries=0 ensures all retries go through litellm's wrapper uniformly across all providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
unstract/sdk1/src/unstract/sdk1/llm.py (1)
474-478:⚠️ Potential issue | 🟡 MinorHandle retry bounds explicitly instead of truthy-checking.
On Line 475, truthy branching is brittle for retry config. Please branch on
is Noneand validate non-negative integer input before setting retry fields.Suggested patch
max_retries = completion_kwargs.get("max_retries") - if max_retries: - completion_kwargs["num_retries"] = max_retries - completion_kwargs["max_retries"] = 0 - completion_kwargs["retry_strategy"] = "exponential_backoff_retry" + if max_retries is None: + return + if not isinstance(max_retries, int) or max_retries < 0: + raise SdkError("Invalid max_retries: expected a non-negative integer") + completion_kwargs["num_retries"] = max_retries + completion_kwargs["max_retries"] = 0 + if max_retries > 0: + completion_kwargs["retry_strategy"] = "exponential_backoff_retry"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 474 - 478, The current truthy check for max_retries is brittle; in unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs and max_retries and change the branching to check "if max_retries is not None" then validate that max_retries is an integer and >= 0 (raise a ValueError if not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries, completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; ensure invalid types or negative values are rejected with a clear error message referencing max_retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 474-478: The current truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5d2e77e4-9b3b-4b66-b2b4-69da9eb66a62
📒 Files selected for processing (1)
unstract/sdk1/src/unstract/sdk1/llm.py



What
completion_with_retries) to work for all providersmax_retriestonum_retriesparameter before calling litellmretry_strategytoexponential_backoff_retryfor all LLM completion callsWhy
Users configure
max_retriesin the adapter UI (defaults 3-5), but this only works for SDK-based providers (OpenAI, Azure). For httpx-based providers (Anthropic, Vertex AI, Bedrock, Mistral, Azure AI Foundry), retries are silently ignored — zero retries on transient errors.Production incident: Anthropic API 500 error after ~9 minutes caused immediate execution failure (ID: 23194211-ac04-442f-8a04-dde0ecf06195).
litellm has a separate wrapper-level retry mechanism that works for ALL providers, but only activates when
num_retriesis set. Our code never set it.How
Modified
LLM.complete(),LLM.stream_complete(), andLLM.acomplete()in the SDK to:max_retriesfrom kwargs tonum_retriesretry_strategytoexponential_backoff_retrylitellm internally sets
max_retries=0during wrapper retries to prevent double-retry with SDK providers.Can this PR break any existing features. If yes, please list possible items. If no, please explain why.
No. This only activates litellm's existing wrapper retry mechanism with user-configured values. SDK-based providers (OpenAI, Azure) continue to use their native retry via
max_retries(litellm sets it to 0 during wrapper retries). httpx-based providers now benefit from proper retry handling instead of failing immediately on transient errors.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
The fix can be validated by checking prompt-service logs for retry-related entries when transient LLM errors occur. Previously, errors would show a single failure; with the fix, you should see multiple retry attempts before final failure. Exponential backoff is applied (1s, 2s, 4s, 8s, 10s cap).
Screenshots
Checklist
I have read and understood the Contribution Guidelines.