Skip to content

Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets.#9737

Merged
akshay-joshi merged 5 commits intopgadmin-org:masterfrom
dpage:chat_context
Mar 16, 2026

Conversation

@dpage
Copy link
Contributor

@dpage dpage commented Mar 11, 2026

Summary by CodeRabbit

  • Bug Fixes

    • AI Assistant now retains conversation context between messages.
  • New Features

    • Conversation history compaction to optimize token usage while preserving key messages and tool-response pairs.
    • NLQ chat accepts incoming history and returns filtered conversation history and conversation IDs in streaming responses.
    • Client UI preserves/manages conversation history, supports clearing, and ignores late stream events.
  • Tests

    • Added comprehensive tests for compaction, serialization, filtering, and history-aware chat flows.
  • Documentation

    • Release notes and test guidance updated for history compaction.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an LLM conversation history compaction module and integrates history deserialization/compaction into NLQ chat: backend accepts history, compacts it before calling the LLM, returns filtered conversational history and conversation_id; frontend sends/receives history, handles cleared/concurrent streams; tests and release notes added.

Changes

Cohort / File(s) Summary
Release Notes & Test Docs
docs/en_US/release_notes_9_14.rst, web/pgadmin/llm/tests/README.md
Added release-note entry for issue #9736 and documented compaction tests and coverage updates.
Compaction Module
web/pgadmin/llm/compaction.py
New module: token estimation, per-message/history token counting, message importance scoring, tool-call/result pairing, history compaction algorithm, serialization/deserialization, and conversational filtering.
Compaction Tests
web/pgadmin/llm/tests/test_compaction.py
New unit tests for token estimation, compaction behavior (preserve first/recent, drop low-importance, maintain tool-pair integrity), deserialization, and conversational filtering.
NLQ Chat Backend Integration
web/pgadmin/tools/sqleditor/__init__.py
nlq_chat_stream now accepts optional history in request, deserializes and compacts it, passes conversation_history to chat_with_database, and emits filtered conversational history plus conversation_id in SSE events.
NLQ Chat Frontend
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
Adds conversationHistory state, includes history in POST payloads, updates local history from SSE events, tracks cleared state to cancel/ignore late stream events, and adjusts stop/error handling.
NLQ Chat Tests
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
Adds "NLQ Chat - With History" scenario verifying history is accepted and conversation_history forwarded to chat_with_database; extends SSE stream test flows and mock inspection.
Manifest
package.json
Minor manifest line(s) modified.

Sequence Diagram

sequenceDiagram
    participant Frontend as NLQChatPanel
    participant Backend as nlq_chat_stream
    participant Compaction as compaction.py
    participant LLM as chat_with_database

    Frontend->>Backend: POST (message + optional history)
    activate Backend

    Backend->>Compaction: deserialize_history(history_data)
    activate Compaction
    Compaction-->>Backend: Message[] 
    deactivate Compaction

    Backend->>Compaction: compact_history(messages, max_tokens)
    activate Compaction
    Note over Compaction: Classify messages by importance\nPreserve first & recent messages\nMaintain tool-call/result pairs\nDrop low-importance messages until within budget
    Compaction-->>Backend: compacted Message[]
    deactivate Compaction

    Backend->>LLM: chat_with_database(conversation_history, system_prompt)
    activate LLM
    LLM-->>Backend: updated_history
    deactivate LLM

    Backend->>Compaction: filter_conversational(updated_history)
    activate Compaction
    Note over Compaction: Remove internal tool messages\nKeep user & final assistant messages
    Compaction-->>Backend: filtered Message[]
    deactivate Compaction

    Backend-->>Frontend: SSE event (response + history + conversation_id)
    deactivate Backend

    Frontend->>Frontend: update conversationHistory state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: fixing AI Assistant conversation context retention and implementing chat history compaction for token budgeting.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.

Add a pylint configuration file to your project to customize how CodeRabbit runs pylint.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 185-212: _find_tool_pair_indices currently only records one
partner index per assistant turn, which loses additional tool-result indices
when a single assistant message has multiple Message.tool_calls; change it to
return (or build) groupings of indices so each index maps to the full set of
indices that belong to that tool-use turn (e.g., map an assistant index and all
matching tool-result indices to the same set/list of indices), update the
function logic to collect all matching j indices for every assistant i (use a
collection like set for tool_call_ids and matched_tool_indices), and then return
a mapping where every index in that group points to the complete group (and
update compact_history to consult the group for protection/removal instead of
expecting a single partner).
- Around line 249-258: The current protected set (protected, protected.add(0),
recent_window) can itself exceed max_tokens and leave candidates empty; add a
fallback after building protected that computes the token count of protected
messages (use the same tokenization logic used elsewhere) and, if
protected_tokens > max_tokens, iteratively relax protections by shrinking the
recent window (e.g., decrement recent_window or remove the highest-index entries
from protected, preserving index 0) until protected_tokens <= max_tokens or only
the first message remains protected; then rebuild candidates and proceed so
compaction can produce a trimmed history even when the initially protected set
is over budget.

In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 138-168: The test_drops_low_value fixture accidentally inflates
low-value messages with the shared filler so they no longer classify as
low-importance; update the test so the transient/routine messages (the ones with
contents 'ok', 'thanks', etc.) remain short (remove the appended filler for
those specific Message.user/Message.assistant entries) or alternatively stub the
token estimation used by compact_history/_classify_message to force those
entries to be low-value, then assert that those exact short entries are removed
while the first and recent messages remain.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 408-412: The Clear handler (handleClearConversation) currently
resets state via setMessages, setConversationId and setConversationHistory but
does not prevent in-flight stream handlers from later re-appending old events;
add a silent-cancel token/guard: store a mutable requestId or AbortController in
component state (e.g., currentRequestToken/currentAbortController), increment or
replace it inside handleClearConversation, and update all
streaming/read/complete event handlers to compare the incoming event's token
against the latest token (or check abortController.signal) and return early if
it doesn’t match, so late "Generation stopped." or "complete" events are ignored
and cannot repopulate messages/conversationId/conversationHistory after Clear.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 50-65: The test case for "NLQ Chat - With History" doesn't verify
that nlq_chat_stream() passes the provided history to chat_with_database() or
that the SSE output includes the history; update the test to assert that
chat_with_database() was called with a non-empty conversation_history (or
specifically equals the provided history) and/or parse the emitted SSE event
from nlq_chat_stream() to assert the returned payload contains the history field
matching the input; reference the test helper/mock for chat_with_database(), the
nlq_chat_stream() invocation, and the "history" variable in the test to locate
where to add these assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a096758c-1b91-42eb-8f4f-519c35414927

📥 Commits

Reviewing files that changed from the base of the PR and between c8bd75c and c30e633.

📒 Files selected for processing (7)
  • docs/en_US/release_notes_9_14.rst
  • web/pgadmin/llm/compaction.py
  • web/pgadmin/llm/tests/README.md
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

@dpage
Copy link
Contributor Author

dpage commented Mar 12, 2026

The failed test here appears to be the one fixed in #9723

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx (1)

409-424: ⚠️ Potential issue | 🟠 Major

Make Clear request-scoped instead of relying on one shared clearedRef.

Line 464 resets clearedRef.current for every new submit. If request A is cleared and request B starts before A finishes unwinding, A's later cleanup can still clobber B's active refs/loading state and potentially reapply stale conversation state. Please gate cleanup and SSE handling with a per-request token (or current controller identity), not a single component-wide boolean.

💡 Suggested direction
+  const requestIdRef = useRef(0);
+
   const handleClearConversation = () => {
+    requestIdRef.current += 1;
     clearedRef.current = true;
     ...
   };

   const handleSubmit = async () => {
+    const requestId = ++requestIdRef.current;
     stoppedRef.current = false;
     clearedRef.current = false;
     ...
-              handleSSEEvent(data, thinkingId);
+              if (requestIdRef.current === requestId) {
+                handleSSEEvent(data, thinkingId);
+              }
     ...
-    } finally {
-      setIsLoading(false);
-      setThinkingMessageId(null);
-    }
+    } finally {
+      if (requestIdRef.current === requestId) {
+        setIsLoading(false);
+        setThinkingMessageId(null);
+      }
+    }
   };

Also applies to: 462-464, 579-582

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 409 - 424, The component-wide clearedRef flow causes race
conditions; replace it with a request-scoped identity (e.g., a requestId or
attach an id to the AbortController) and use that id to gate cleanup and SSE
handlers instead of clearedRef.current. On submit create a new id (or use
abortControllerRef.current) and store it as the “currentRequestId” on a ref;
when starting/handling streams (and in any async callbacks) compare the
callback’s captured requestId to the live currentRequestId before mutating
readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or
setConversationHistory. Modify handleClearConversation to cancel/abort the
specific controller and advance/clear the currentRequestId so only that
request’s callbacks will run cleanup, and ensure all SSE/stream handlers check
the request-scoped id before applying state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 256-279: The recent-window protection logic builds the protected
set (protected, recent_window, recent_start) but can split tool-use groups
leaving orphaned Role.TOOL or assistant tool-call messages; before computing
protected_tokens (and before the while budget loop) expand protected to include
entire tool-use groups or mark any group that intersects protected as
undroppable so all partner messages are included; specifically, after building
protected (and similarly in the other similar blocks around the code handling
indices 311-318 and 333-339), iterate the message group boundaries (using
messages, their roles and any tool-call markers used by your grouping logic) to
union in all indices of any group that intersects protected, then compute
protected_tokens via estimate_message_tokens(provider) on the expanded set so
the budgeting loop never leaves an orphaned tool message.
- Around line 349-400: deserialize_history currently assumes history_data and
nested items are dicts and calls .get() directly, which will crash if any item
is None or not a dict; update deserialize_history to validate shapes before
accessing keys: ensure history_data is an iterable of dicts (skip non-dict
items), treat item.get('tool_calls') and item.get('tool_results') as
(item.get(...) or []) and iterate only over entries that are dicts, and check
each tc_data/tr_data is a dict before constructing ToolCall/ToolResult; also
validate role_str is a string before passing to Role(role_str) and skip
malformed entries so Message, Role, ToolCall, ToolResult creation only occurs
for well-formed dicts.

In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 133-136: The test only compares overlapping elements so it can
pass if compact_history() returns fewer than four messages; update the assertion
to ensure the full four-message tail is preserved by first asserting
len(result[-4:]) == 4 (or len(result) >= 4) and then asserting the two slices
are equal (compare last_4_original == last_4_result) so the test fails when any
protected recent message is missing; locate this change around compact_history()
usage and the variables last_4_original / last_4_result in test_compaction.py.

---

Duplicate comments:
In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 409-424: The component-wide clearedRef flow causes race
conditions; replace it with a request-scoped identity (e.g., a requestId or
attach an id to the AbortController) and use that id to gate cleanup and SSE
handlers instead of clearedRef.current. On submit create a new id (or use
abortControllerRef.current) and store it as the “currentRequestId” on a ref;
when starting/handling streams (and in any async callbacks) compare the
callback’s captured requestId to the live currentRequestId before mutating
readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or
setConversationHistory. Modify handleClearConversation to cancel/abort the
specific controller and advance/clear the currentRequestId so only that
request’s callbacks will run cleanup, and ensure all SSE/stream handlers check
the request-scoped id before applying state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 49df7280-efd9-4ef9-9139-880143b82c2a

📥 Commits

Reviewing files that changed from the base of the PR and between c30e633 and 11ea048.

📒 Files selected for processing (4)
  • web/pgadmin/llm/compaction.py
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

dpage and others added 3 commits March 16, 2026 08:55
…ntext between messages, with chat history compaction to manage token budgets.
- Track tool-use turns as groups instead of one-to-one pairs, so
  multi-tool assistant messages don't leave orphaned results.
- Add fallback to shrink the recent window when protected messages
  alone exceed the token budget, preventing compaction no-ops.
- Fix low-value test fixtures to keep transient messages short so
  they actually classify as low-importance.
- Guard Clear button against in-flight stream race conditions by
  adding a clearedRef flag and cancelling active streams.
- Assert that conversation history is actually passed through to
  chat_with_database in the "With History" test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Expand protected set to cover full tool groups, preventing orphaned
  tool call/result messages when a turn straddles the recent window.
- Add input validation in deserialize_history() for non-list/non-dict data.
- Strengthen test assertion for preserved recent window tail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
web/pgadmin/llm/compaction.py (2)

267-288: ⚠️ Potential issue | 🟠 Major

Re-budget after expanding protected tool groups.

Line 283 expands protected after the protected-window budget loop (Line 269). If group expansion itself pushes protected tokens over max_tokens, compaction can still return oversized history when candidates are exhausted.

💡 Proposed fix
-    # If protected messages alone exceed the budget, shrink the
-    # recent window until we have room for compaction candidates.
-    while recent_window > 0:
+    # Find tool groups early so protection + budgeting are consistent.
+    tool_groups = _find_tool_pair_indices(messages)
+
+    def _expand_protected(indices: set[int]) -> set[int]:
+        expanded = set(indices)
+        for idx in list(indices):
+            if idx in tool_groups:
+                expanded |= set(tool_groups[idx])
+        return expanded
+
+    protected = _expand_protected(protected)
+
+    # If protected messages alone exceed the budget, shrink the recent window.
+    while recent_window > 0:
         protected_tokens = sum(
             estimate_message_tokens(messages[i], provider)
             for i in protected
         )
         if protected_tokens <= max_tokens:
             break
         recent_window -= 1
         recent_start = max(1, total - recent_window)
-        protected = {0} | set(range(recent_start, total))
-
-    # Find tool groups
-    tool_groups = _find_tool_pair_indices(messages)
-
-    # Expand protected set to include entire tool groups so we never
-    # split a tool-use turn (leaving orphaned call or result messages).
-    for i in list(protected):
-        if i in tool_groups:
-            protected |= set(tool_groups[i])
+        protected = _expand_protected({0} | set(range(recent_start, total)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/compaction.py` around lines 267 - 288, After expanding the
protected set with tool groups (using _find_tool_pair_indices and updating
protected), recompute the protected_tokens and re-run the recent_window
shrinking logic so protected_tokens <= max_tokens; specifically, after the loop
that adds tool group indices to protected, recalculate protected_tokens =
sum(estimate_message_tokens(messages[i], provider) for i in protected) and, if
it exceeds max_tokens, reduce recent_window and update recent_start/protected
exactly like the earlier budget loop until protected_tokens <= max_tokens
(ensure you reference the same variables recent_window, recent_start, protected,
max_tokens, messages, provider).

378-384: ⚠️ Potential issue | 🟠 Major

Harden role deserialization for malformed payloads.

role is external input. A non-string/unhashable value can still raise during enum conversion and fail the turn instead of skipping the bad entry.

💡 Proposed fix
         role_str = item.get('role', '')
         content = item.get('content', '')
+        if not isinstance(role_str, str):
+            continue
+        if not isinstance(content, str):
+            content = '' if content is None else str(content)

         try:
             role = Role(role_str)
-        except ValueError:
+        except (ValueError, TypeError):
             continue  # Skip unknown roles
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/compaction.py` around lines 378 - 384, The enum conversion of
external input fails for non-string/unhashable values; in the loop around
Role(role_str) (using variables role_str, item and enum Role) validate/coerce
role_str to a plain str (e.g., if not isinstance(role_str, str) then skip or
convert) and expand the exception handler to catch both ValueError and TypeError
so malformed payloads (None, dicts, lists, etc.) are skipped rather than
bubbling up and breaking the turn.
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx (1)

563-568: ⚠️ Potential issue | 🟠 Major

Cleared conversations can still be repopulated by buffered SSE events.

handleSSEEvent does not guard against clearedRef.current. A late buffered complete event can still restore old messages/history after Clear.

💡 Proposed fix
         for (const line of lines) {
+          if (clearedRef.current) break;
           if (line.startsWith('data: ')) {
             try {
               const data = JSON.parse(line.slice(6));
-              handleSSEEvent(data, thinkingId);
+              if (!clearedRef.current) {
+                handleSSEEvent(data, thinkingId);
+              }
             } catch {
               // Skip malformed JSON
             }
           }
         }

   const handleSSEEvent = (event, thinkingId) => {
+    if (clearedRef.current) return;
     switch (event.type) {

Also applies to: 630-679

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 563 - 568, The SSE parsing loop in NLQChatPanel.jsx calls
handleSSEEvent for every parsed event without checking clearedRef.current, so
late buffered "complete" or other events can repopulate a cleared conversation;
update the logic in the loop that parses lines (the block that JSON.parses
line.slice(6) and calls handleSSEEvent) to check clearedRef.current (or an
equivalent "isCleared" signal) before invoking handleSSEEvent and drop events if
the chat was cleared for the current thinkingId; apply the same guard to the
other SSE handling block referenced (around the 630-679 range) so no stale SSE
events are processed after Clear.
🧹 Nitpick comments (2)
web/pgadmin/llm/compaction.py (1)

330-330: Use _score in second pass loop to satisfy lint and intent clarity.

💡 Proposed fix
-        for idx, score in candidates:
+        for idx, _score in candidates:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/compaction.py` at line 330, The second-pass loop currently
unpacks candidates as "for idx, score in candidates:" but the variable is
unused; change the unpack to "for idx, _score in candidates:" and update any
references inside the loop that intend the unused value to use _score (or remove
references if truly unused) in the function that contains this loop (the second
pass loop in compaction logic), so the lint warning is resolved and intent is
clearer.
web/pgadmin/llm/tests/test_compaction.py (1)

214-227: Add a malformed-role regression case for deserialization robustness.

Given this path consumes request JSON, include a case where role is non-string/unhashable to ensure invalid entries are skipped without exceptions.

💡 Suggested test addition
     scenarios = [
@@
         ('Deserialize - Skips Unknown Roles', dict(
             test_method='test_unknown_roles'
         )),
+        ('Deserialize - Skips Malformed Role Types', dict(
+            test_method='test_malformed_role_type'
+        )),
     ]
@@
     def test_unknown_roles(self):
@@
         result = deserialize_history(data)
         self.assertEqual(len(result), 2)
+
+    def test_malformed_role_type(self):
+        """Should skip entries with non-string/unhashable roles."""
+        data = [
+            {'role': 'user', 'content': 'Hello'},
+            {'role': ['assistant'], 'content': 'bad role type'},
+            {'role': None, 'content': 'also bad'},
+            {'role': 'assistant', 'content': 'Hi'},
+        ]
+        result = deserialize_history(data)
+        self.assertEqual([m.role for m in result], [Role.USER, Role.ASSISTANT])

Also applies to: 272-281

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/tests/test_compaction.py` around lines 214 - 227, Add a
regression scenario to the scenarios list to cover malformed/unhashable role
values so deserialization skips invalid entries without raising; specifically
update the scenarios variable (the list defined around the top of
test_compaction.py) to include a case like ('Deserialize - Malformed Role',
dict(test_method='test_malformed_role')), and add the corresponding
test_malformed_role test implementation (and mirror the same additional scenario
in the second scenarios block around lines 272-281) so the test suite submits a
non-string/unhashable role in the request JSON and asserts that deserialization
silently skips it rather than throwing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 267-288: After expanding the protected set with tool groups (using
_find_tool_pair_indices and updating protected), recompute the protected_tokens
and re-run the recent_window shrinking logic so protected_tokens <= max_tokens;
specifically, after the loop that adds tool group indices to protected,
recalculate protected_tokens = sum(estimate_message_tokens(messages[i],
provider) for i in protected) and, if it exceeds max_tokens, reduce
recent_window and update recent_start/protected exactly like the earlier budget
loop until protected_tokens <= max_tokens (ensure you reference the same
variables recent_window, recent_start, protected, max_tokens, messages,
provider).
- Around line 378-384: The enum conversion of external input fails for
non-string/unhashable values; in the loop around Role(role_str) (using variables
role_str, item and enum Role) validate/coerce role_str to a plain str (e.g., if
not isinstance(role_str, str) then skip or convert) and expand the exception
handler to catch both ValueError and TypeError so malformed payloads (None,
dicts, lists, etc.) are skipped rather than bubbling up and breaking the turn.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 563-568: The SSE parsing loop in NLQChatPanel.jsx calls
handleSSEEvent for every parsed event without checking clearedRef.current, so
late buffered "complete" or other events can repopulate a cleared conversation;
update the logic in the loop that parses lines (the block that JSON.parses
line.slice(6) and calls handleSSEEvent) to check clearedRef.current (or an
equivalent "isCleared" signal) before invoking handleSSEEvent and drop events if
the chat was cleared for the current thinkingId; apply the same guard to the
other SSE handling block referenced (around the 630-679 range) so no stale SSE
events are processed after Clear.

---

Nitpick comments:
In `@web/pgadmin/llm/compaction.py`:
- Line 330: The second-pass loop currently unpacks candidates as "for idx, score
in candidates:" but the variable is unused; change the unpack to "for idx,
_score in candidates:" and update any references inside the loop that intend the
unused value to use _score (or remove references if truly unused) in the
function that contains this loop (the second pass loop in compaction logic), so
the lint warning is resolved and intent is clearer.

In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 214-227: Add a regression scenario to the scenarios list to cover
malformed/unhashable role values so deserialization skips invalid entries
without raising; specifically update the scenarios variable (the list defined
around the top of test_compaction.py) to include a case like ('Deserialize -
Malformed Role', dict(test_method='test_malformed_role')), and add the
corresponding test_malformed_role test implementation (and mirror the same
additional scenario in the second scenarios block around lines 272-281) so the
test suite submits a non-string/unhashable role in the request JSON and asserts
that deserialization silently skips it rather than throwing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0130a15a-7eb1-48e1-a56f-9edb8311e743

📥 Commits

Reviewing files that changed from the base of the PR and between 11ea048 and e2c38e1.

📒 Files selected for processing (7)
  • docs/en_US/release_notes_9_14.rst
  • web/pgadmin/llm/compaction.py
  • web/pgadmin/llm/tests/README.md
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
✅ Files skipped from review due to trivial changes (1)
  • web/pgadmin/llm/tests/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/en_US/release_notes_9_14.rst

- Lower max_tokens budget in test_drops_low_value to reliably force
  compaction (500 was borderline, use 200).
- Consume SSE response data before asserting mock calls in NLQ chat
  test, since Flask's streaming generator only executes on iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py (1)

165-180: Stream consumption fix is correct; consider strengthening the assertion.

Consuming response.data before asserting mock calls is the right approach since Flask streaming generators execute lazily. The assertion confirms history is passed but doesn't verify it matches the input.

For stronger coverage, you could assert the deserialized history length or content matches expectations:

♻️ Optional: Assert specific history content
                     conv_hist = call_kwargs.get(
                         'conversation_history', []
                     )
                     self.assertTrue(
                         len(conv_hist) > 0,
                         'conversation_history should be non-empty '
                         'when history is provided'
                     )
+                    # Verify we got the expected number of messages
+                    self.assertEqual(
+                        len(conv_hist), len(self.history),
+                        'conversation_history should contain all '
+                        'provided history messages'
+                    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py` around lines 165 - 180,
The test currently only checks that mock_chat_obj was called and that
conversation_history is non-empty after consuming response.data; strengthen this
by asserting the conversation_history content/length matches the expected input
history used in the test (e.g., compare deserialized conversation_history
returned from mock_chat_obj.call_args.kwargs to the test's self.history or
expected list). Locate the assertion block around
mock_chat_obj.assert_called_once() in test_nlq_chat.py, extract call_kwargs =
mock_chat_obj.call_args.kwargs and conv_hist =
call_kwargs.get('conversation_history', []), then add assertions that conv_hist
== self.history (or assert specific elements/length) to ensure the exact history
was passed through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 115-119: The test is patching the definition site
('pgadmin.llm.chat.chat_with_database') instead of the import site, so the local
binding imported in sqleditor.__init__ still calls the real function and the
mock's call_args stay empty; update the mock patch target used by
mock_chat_patcher to the import site where sqleditor binds chat_with_database
(i.e., patch 'pgadmin.tools.sqleditor.chat_with_database' or the exact module
name used by sqleditor.__init__) so that the local name used in the code under
test is replaced and assertions on mock_chat_obj.call_args become meaningful.

---

Nitpick comments:
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 165-180: The test currently only checks that mock_chat_obj was
called and that conversation_history is non-empty after consuming response.data;
strengthen this by asserting the conversation_history content/length matches the
expected input history used in the test (e.g., compare deserialized
conversation_history returned from mock_chat_obj.call_args.kwargs to the test's
self.history or expected list). Locate the assertion block around
mock_chat_obj.assert_called_once() in test_nlq_chat.py, extract call_kwargs =
mock_chat_obj.call_args.kwargs and conv_hist =
call_kwargs.get('conversation_history', []), then add assertions that conv_hist
== self.history (or assert specific elements/length) to ensure the exact history
was passed through.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 23dbe688-7bf2-44ec-ba2b-ce8610f2b8d1

📥 Commits

Reviewing files that changed from the base of the PR and between e2c38e1 and 96820b0.

📒 Files selected for processing (2)
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

Add comment explaining why we patch the source module rather than the
use site: the endpoint uses a local import inside the function body,
so there is no module-level binding to patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py (1)

132-137: started_mocks is populated but never used.

The list collects started mock objects but is not referenced after the loop. Consider removing it if not needed, or using it for cleanup verification.

♻️ Suggested simplification
         # Start all patches
-        started_mocks = []
         for p in patches:
-            m = p.start()
-            started_mocks.append(m)
+            m = p.start()
             if p is mock_chat_patcher:
                 mock_chat_obj = m
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py` around lines 132 - 137,
started_mocks is being populated but never used; remove the started_mocks list
and the started_mocks.append(m) call inside the loop, and keep only m =
p.start() with the existing conditional to set mock_chat_obj when p is
mock_chat_patcher (or alternatively, if you intended to use started_mocks for
cleanup, replace the unused list by using it to stop each mock in teardown by
iterating over started_mocks and calling stop on each). Ensure references to
started_mocks are removed or replaced so there are no unused variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 132-137: started_mocks is being populated but never used; remove
the started_mocks list and the started_mocks.append(m) call inside the loop, and
keep only m = p.start() with the existing conditional to set mock_chat_obj when
p is mock_chat_patcher (or alternatively, if you intended to use started_mocks
for cleanup, replace the unused list by using it to stop each mock in teardown
by iterating over started_mocks and calling stop on each). Ensure references to
started_mocks are removed or replaced so there are no unused variables.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 982f01b0-5ffa-48e1-8357-f991dc0ecb6d

📥 Commits

Reviewing files that changed from the base of the PR and between 96820b0 and 4621155.

📒 Files selected for processing (1)
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

@akshay-joshi akshay-joshi merged commit 990b4f5 into pgadmin-org:master Mar 16, 2026
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants