UN-3344 [FIX] Activate litellm retry for all LLM providers by chandrasekharan-zipstack · Pull Request #1867 · Zipstack/unstract

chandrasekharan-zipstack · 2026-03-18T15:00:51Z

What

Bridge litellm's wrapper-level retry (completion_with_retries) to work for all providers
Copy user-configured max_retries to num_retries parameter before calling litellm
Set retry_strategy to exponential_backoff_retry for all LLM completion calls

Why

Users configure max_retries in the adapter UI (defaults 3-5), but this only works for SDK-based providers (OpenAI, Azure). For httpx-based providers (Anthropic, Vertex AI, Bedrock, Mistral, Azure AI Foundry), retries are silently ignored — zero retries on transient errors.

Production incident: Anthropic API 500 error after ~9 minutes caused immediate execution failure (ID: 23194211-ac04-442f-8a04-dde0ecf06195).

litellm has a separate wrapper-level retry mechanism that works for ALL providers, but only activates when num_retries is set. Our code never set it.

How

Modified LLM.complete(), LLM.stream_complete(), and LLM.acomplete() in the SDK to:

Copy max_retries from kwargs to num_retries
Set retry_strategy to exponential_backoff_retry
Let litellm handle wrapper-level retries with exponential backoff (1s→2s→4s→8s→10s cap)

litellm internally sets max_retries=0 during wrapper retries to prevent double-retry with SDK providers.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No. This only activates litellm's existing wrapper retry mechanism with user-configured values. SDK-based providers (OpenAI, Azure) continue to use their native retry via max_retries (litellm sets it to 0 during wrapper retries). httpx-based providers now benefit from proper retry handling instead of failing immediately on transient errors.

Database Migrations

None

Env Config

None

Relevant Docs

litellm retry strategy: https://docs.litellm.ai/docs/completion/retries

Related Issues or PRs

Fixes UN-3344

Dependencies Versions

None

Notes on Testing

The fix can be validated by checking prompt-service logs for retry-related entries when transient LLM errors occur. Previously, errors would show a single failure; with the fix, you should see multiple retry attempts before final failure. Exponential backoff is applied (1s, 2s, 4s, 8s, 10s cap).

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

litellm's wrapper-level retry (completion_with_retries) works for all providers including httpx-based ones (Anthropic, Vertex, Bedrock, Mistral, Azure AI Foundry), but only activates when num_retries is set in kwargs. Our adapters pass max_retries (from user UI config) which only works for SDK-based providers (OpenAI, Azure). httpx-based providers silently ignored it, resulting in zero retries on transient errors (500, 502, 503). Bridge the gap by copying the user's max_retries value into num_retries and setting retry_strategy to exponential_backoff_retry before calling litellm.completion(). litellm internally zeroes max_retries during wrapper retries to prevent double-retry with SDK providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-18T15:01:15Z

Summary by CodeRabbit

New Features
- Improved retry behavior for LLM completions: user-configured retry limits are now propagated and applied across all completion methods and provider types.
- Added exponential backoff to retry handling to reduce transient failures and improve reliability.

Walkthrough

Added a static method _set_litellm_retry_params() to LLM that maps max_retries into litellm's retry parameters (setting num_retries and enabling exponential backoff). The method is called before litellm invocations in complete, stream_complete, and acomplete paths.

Changes

Cohort / File(s)	Summary
Retry Configuration `unstract/sdk1/src/unstract/sdk1/llm.py`	Added static method `_set_litellm_retry_params(completion_kwargs: dict)` to translate `max_retries` → litellm `num_retries` and enable exponential backoff. Method invoked in `complete()`, `stream_complete()`, `acomplete()` and their internal call paths to apply retry settings before litellm calls.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title directly addresses the main change: activating litellm retry for all LLM providers, which is the core objective of this changeset.
Description check	✅ Passed	The PR description comprehensively covers all template sections with substantial details about what changed, why it was needed, how it was implemented, impact assessment, testing approach, and related issue links.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/litellm-retry-num-retries

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-18T15:03:47Z

Greptile Summary

This PR fixes silent retry failures for httpx-based LLM providers (Anthropic, Vertex AI, Bedrock, Mistral) by bridging the user-configured max_retries value into litellm's wrapper-level num_retries parameter, which activates completion_with_retries with exponential backoff for all providers uniformly. To prevent multiplicative retries on SDK-based providers (OpenAI, Azure), max_retries is zeroed out before each litellm call.

Key changes:

New _set_litellm_retry_params static method copies max_retries → num_retries, sets max_retries=0, and sets retry_strategy="exponential_backoff_retry".
Method is called consistently across complete(), stream_complete(), and acomplete() before the litellm invocations.
No database, environment, or dependency changes.

Minor issues noted:

No debug-level log is emitted when retry params are activated, reducing production observability for operators verifying the fix or diagnosing latency from backoff cycles.
The completion_kwargs parameter of the new helper is typed as a bare dict without generics, which weakens static type checking at the call boundary.

Confidence Score: 4/5

This PR is safe to merge; it activates an existing litellm mechanism and cannot regress non-retry behaviour.
The change is small, well-scoped, and correctly handles the three call sites. The only open concerns are the previously-discussed falsy if max_retries: guard (instead of is not None) and the two style-level suggestions (missing debug log, bare dict annotation). None of these are blocking correctness issues.
No files require special attention beyond the single changed file.

Important Files Changed

Filename	Overview
unstract/sdk1/src/unstract/sdk1/llm.py	Adds `_set_litellm_retry_params` static method and calls it in `complete()`, `stream_complete()`, and `acomplete()` to bridge `max_retries` into litellm's wrapper-level `num_retries` + `exponential_backoff_retry`; logic is sound but lacks a debug log on activation and has a bare `dict` type annotation.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LLM as LLM.complete() / stream_complete() / acomplete()
    participant Helper as _set_litellm_retry_params()
    participant LiteLLM as litellm (completion_with_retries)
    participant Provider as LLM Provider (Anthropic / OpenAI / Vertex / etc.)

    Caller->>LLM: complete(prompt, **kwargs)
    LLM->>LLM: adapter.validate({**self.kwargs, **kwargs})<br/>→ completion_kwargs (includes max_retries=N)
    LLM->>Helper: _set_litellm_retry_params(completion_kwargs)
    alt max_retries is truthy (N > 0)
        Helper->>Helper: num_retries = N
        Helper->>Helper: max_retries = 0
        Helper->>Helper: retry_strategy = "exponential_backoff_retry"
    else max_retries is falsy / not set
        Helper->>Helper: no-op
    end
    LLM->>LiteLLM: litellm.completion(messages, num_retries=N, max_retries=0, retry_strategy=...)
    loop Wrapper retry (up to num_retries times, exponential backoff: 1s→2s→4s→8s→10s)
        LiteLLM->>Provider: HTTP request (max_retries=0 → no SDK-level retry)
        alt Transient error (5xx, timeout)
            Provider-->>LiteLLM: Error
            LiteLLM->>LiteLLM: backoff, increment attempt
        else Success
            Provider-->>LiteLLM: Response
            LiteLLM-->>LLM: ModelResponse
        end
    end
    LLM-->>Caller: dict with response / LLMError on exhaustion

Prompt To Fix All With AI

This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/llm.py
Line: 474-478

Comment:
**Missing observability log for retry activation**

There's no log emitted when wrapper-level retries are activated. In production, operators troubleshooting retry behaviour (e.g. verifying the fix works as expected, or diagnosing slow completions caused by repeated backoff) would have no SDK-level trace of the configured `num_retries`. A debug log here would make it far easier to confirm the feature is engaged:

```suggestion
        max_retries = completion_kwargs.get("max_retries")
        if max_retries:
            logger.debug(
                "Activating litellm wrapper-level retries: num_retries=%s, "
                "strategy=exponential_backoff_retry",
                max_retries,
            )
            completion_kwargs["num_retries"] = max_retries
            completion_kwargs["max_retries"] = 0
            completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/llm.py
Line: 460-461

Comment:
**Loose type annotation on `completion_kwargs`**

The parameter is typed as a bare `dict` with no generic. This makes it harder for type checkers and readers to understand what keys/values are expected. Consider narrowing it to at least `dict[str, Any]` (or a `TypedDict` if one exists for completion kwargs) so callers get appropriate type-checking at the boundary:

```suggestion
    def _set_litellm_retry_params(completion_kwargs: dict[str, Any]) -> None:
```

This would also require adding `from typing import Any` to the imports if it isn't already present.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "[FIX] Zero out max_r..."}

unstract/sdk1/src/unstract/sdk1/llm.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 473-476: The current truthy check for max_retries skips explicit
zero and permits negatives; change the branch to test "max_retries is not None",
validate that max_retries is an integer and >= 0 (or raise a ValueError for
invalid values), then set completion_kwargs["num_retries"] = max_retries and
completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this
logic around the max_retries handling (completion_kwargs, max_retries,
num_retries, retry_strategy) so zero is honored and negatives/non-integers are
rejected.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07e1c53a-ce05-4ead-9621-c256d668fe67

📥 Commits

Reviewing files that changed from the base of the PR and between 41137d8 and 2ae9c3f.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/llm.py

coderabbitai · 2026-03-18T15:05:59Z

unstract/sdk1/src/unstract/sdk1/llm.py

+        max_retries = completion_kwargs.get("max_retries")
+        if max_retries:
+            completion_kwargs["num_retries"] = max_retries
+            completion_kwargs["retry_strategy"] = "exponential_backoff_retry"


⚠️ Potential issue | 🟡 Minor

Handle zero and invalid retry counts explicitly.

At Line 474, using a truthy check skips an explicit max_retries=0 and allows negative values through. Please branch on is not None and validate bounds before copying.

Suggested patch

max_retries = completion_kwargs.get("max_retries") - if max_retries: - completion_kwargs["num_retries"] = max_retries - completion_kwargs["retry_strategy"] = "exponential_backoff_retry" + if max_retries is None: + return + if not isinstance(max_retries, int) or max_retries < 0: + raise SdkError("Invalid max_retries: expected a non-negative integer") + completion_kwargs["num_retries"] = max_retries + if max_retries > 0: + completion_kwargs["retry_strategy"] = "exponential_backoff_retry"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 473 - 476, The current truthy check for max_retries skips explicit zero and permits negatives; change the branch to test "max_retries is not None", validate that max_retries is an integer and >= 0 (or raise a ValueError for invalid values), then set completion_kwargs["num_retries"] = max_retries and completion_kwargs["retry_strategy"] = "exponential_backoff_retry"; apply this logic around the max_retries handling (completion_kwargs, max_retries, num_retries, retry_strategy) so zero is honored and negatives/non-integers are rejected.

SDK-based providers (OpenAI, Azure) default to max_retries=2 internally even when not explicitly set. Without zeroing it, the first attempt exhausts SDK retries before the wrapper retry kicks in, multiplying total attempts (e.g. 5 SDK + 5 wrapper = 11 instead of expected 5). Setting max_retries=0 ensures all retries go through litellm's wrapper uniformly across all providers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-18T15:10:37Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 63 passed, 0 failed (63 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-03-18T15:12:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

♻️ Duplicate comments (1)

unstract/sdk1/src/unstract/sdk1/llm.py (1)

474-478: ⚠️ Potential issue | 🟡 Minor

Handle retry bounds explicitly instead of truthy-checking.

On Line 475, truthy branching is brittle for retry config. Please branch on is None and validate non-negative integer input before setting retry fields.

Suggested patch

     max_retries = completion_kwargs.get("max_retries")
-    if max_retries:
-        completion_kwargs["num_retries"] = max_retries
-        completion_kwargs["max_retries"] = 0
-        completion_kwargs["retry_strategy"] = "exponential_backoff_retry"
+    if max_retries is None:
+        return
+    if not isinstance(max_retries, int) or max_retries < 0:
+        raise SdkError("Invalid max_retries: expected a non-negative integer")
+    completion_kwargs["num_retries"] = max_retries
+    completion_kwargs["max_retries"] = 0
+    if max_retries > 0:
+        completion_kwargs["retry_strategy"] = "exponential_backoff_retry"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@unstract/sdk1/src/unstract/sdk1/llm.py` around lines 474 - 478, The current
truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@unstract/sdk1/src/unstract/sdk1/llm.py`:
- Around line 474-478: The current truthy check for max_retries is brittle; in
unstract/sdk1/src/unstract/sdk1/llm.py locate the block using completion_kwargs
and max_retries and change the branching to check "if max_retries is not None"
then validate that max_retries is an integer and >= 0 (raise a ValueError if
not), otherwise proceed to set completion_kwargs["num_retries"] = max_retries,
completion_kwargs["max_retries"] = 0 and completion_kwargs["retry_strategy"] =
"exponential_backoff_retry"; ensure invalid types or negative values are
rejected with a clear error message referencing max_retries.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d2e77e4-9b3b-4b66-b2b4-69da9eb66a62

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae9c3f and 9467015.

📒 Files selected for processing (1)

unstract/sdk1/src/unstract/sdk1/llm.py

pk-zipstack

LGTM

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

unstract/sdk1/src/unstract/sdk1/llm.py Show resolved Hide resolved

chandrasekharan-zipstack self-assigned this Mar 18, 2026

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

chandrasekharan-zipstack requested review from gaya3-zipstack, hari-kuriakose and pk-zipstack March 18, 2026 15:08

coderabbitai bot reviewed Mar 18, 2026

View reviewed changes

pk-zipstack approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UN-3344 [FIX] Activate litellm retry for all LLM providers#1867

UN-3344 [FIX] Activate litellm retry for all LLM providers#1867
chandrasekharan-zipstack wants to merge 2 commits intomainfrom
fix/litellm-retry-num-retries

chandrasekharan-zipstack commented Mar 18, 2026

Uh oh!

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

sonarqubecloud bot commented Mar 18, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

pk-zipstack left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chandrasekharan-zipstack commented Mar 18, 2026

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 18, 2026

Test Results

Uh oh!

sonarqubecloud bot commented Mar 18, 2026

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pk-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading