PSv2: Show online / offline status for async processing services by mihow · Pull Request #1146 · RolnickLab/antenna

mihow · 2026-02-21T03:57:44Z

Summary

Closes #1122 & #1166

Renames last_checked → last_seen, last_checked_live → last_seen_live, last_checked_latency → last_seen_latency on the ProcessingService model
Adds mark_seen() method for async/pull-mode services to record liveness when they register pipelines
Updates the pipeline registration endpoint (POST /api/v2/projects/{id}/pipelines/) to call mark_seen(live=True) after successful registration
Heartbeat on task-fetch and result-submit endpoints (_mark_pipeline_pull_services_seen), scoped to the job's project
Periodic beat task marks stale async services offline (async-first, before sync checks)
Exposes is_async computed property on the ProcessingService serializer
Fixes UnboundLocalError in select_processing_service() when all online services have no latency data (e.g. async services)
Updates all backend references (serializers, pipeline queryset, tasks)
Updates all frontend references (TypeScript models, table columns, dialog, language strings)
Async services show gray "Unknown" status in the processing services list (since per-service status is not yet scoped)
Async pipelines are treated as selectable in the pipeline picker (not disabled like offline sync pipelines)
Includes Django migration using RenameField (data-preserving, no data loss)

The naming better reflects the semantic difference between:

Sync/push services (with endpoint_url): Antenna actively checks the service via the periodic Celery Beat task, updating last_seen/last_seen_live from the health check response
Async/pull services (without endpoint_url): Workers report in by registering pipelines and polling for tasks, and we record when we last heard from them via mark_seen() and heartbeat updates

Test plan

New unit tests for mark_seen() method (live=True and live=False)
New integration test verifying pipeline registration updates last_seen/last_seen_live
Existing TestProcessingServiceAPI tests pass with renamed fields
Existing TestPipeline and TestProjectPipelinesAPI tests pass
Verify UI columns show "Last seen" instead of "Last checked"
Verify processing service detail dialog shows "Last seen" label
Verify async services show "Unknown" (gray) status in processing services list
Verify async pipelines are selectable in session "Process Now" picker

Follow-up items

Reconcile pipeline pickers: The job creation form uses a generic EntityPicker (name only, no status indicator) while the session "Process Now" view uses a custom PipelinesPicker with status dots and per-pipeline disabling. These should be reconciled — either extract PipelinesPicker into a shared component or enhance EntityPicker to support optional status rendering.
Scope heartbeat to individual service: The heartbeat currently marks all async services on a pipeline+project as live when any one worker polls. Once app-token auth lands (Handle processing_service_name parameters in requests from workers #1117), add an X-Service-ID header so the heartbeat targets the specific calling service. This would allow upgrading the "Unknown" status for async services to real ONLINE/OFFLINE.
Mock network call in test: test_get_status_updates_last_seen_for_sync_service hits a real network address — should be mocked for faster CI.

netlify · 2026-02-21T03:57:48Z

✅ Deploy Preview for antenna-ssec ready!

Name	Link
🔨 Latest commit	`203ad5d`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/69c2032ffd77d00008e65141
😎 Deploy Preview	https://deploy-preview-1146--antenna-ssec.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2026-02-21T03:57:48Z

✅ Deploy Preview for antenna-preview ready!

Name	Link
🔨 Latest commit	`203ad5d`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/69c2032f7be6d20008abc94f
😎 Deploy Preview	https://deploy-preview-1146--antenna-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 76 (🟢 up 10 from production) Accessibility: 89 (🟢 up 9 from production) Best Practices: 92 (🔴 down 8 from production) SEO: 100 (🟢 up 8 from production) PWA: 80 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-02-21T03:57:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Renamed ProcessingService monitoring fields from last_checked* to last_seen*; added is_async and mark_seen() to record heartbeats for pull-mode services; updated models, serializers, views, tasks, migrations, tests, UI, and docs to use last_seen* and emit heartbeats on pipeline registration and job endpoints.

Changes

Cohort / File(s)	Summary
Database Schema & Migration `.agents/DATABASE_SCHEMA.md`, `ami/ml/migrations/0027_rename_last_checked_to_last_seen.py`	Replace `last_checked*` fields with `last_seen` (datetime), `last_seen_live`, and `last_seen_latency`; add migration renaming three fields.
ProcessingService model & manager `ami/ml/models/processing_service.py`	Add `ProcessingServiceQuerySet`/manager with `async_services()`/`sync_services()`; rename `last_checked` → `last_seen`; add `is_async` property and `mark_seen()`; update get_status/get_pipeline_configs to use `last_seen*`.
Pipeline logic `ami/ml/models/pipeline.py`	Switch online filtering and latency selection to use `last_seen_live` and `last_seen_latency`; adjust fallback and logging.
Views & Job endpoints `ami/ml/views.py`, `ami/jobs/views.py`	Call `processing_service.mark_seen(live=True)` after pipeline registration; add `_mark_pipeline_pull_services_seen(job)` and invoke it from job tasks/results endpoints to heartbeat pull-mode services.
Background task (beat) `ami/ml/tasks.py`	Expire stale async (pull) services using `last_seen*` first, then sequentially check sync services; increase task time limits/expiry; improve exception logging.
Serializers & API `ami/ml/serializers.py`	Expose `last_seen`, `last_seen_live`, and `is_async` in serializers; remove `last_checked*` fields.
Backend tests `ami/ml/tests.py`	Add tests for `mark_seen()` and pipeline-registration side-effects on `last_seen`; update status tests to assert new `server_live/last_seen` semantics.
Frontend models `ui/src/data-services/models/processing-service.ts`, `ui/src/data-services/models/pipeline.ts`	Rename getters to `lastSeen`/`lastSeenLive`, add `isAsync`, adjust endpointUrl typing and most-recent timestamp logic.
Frontend UI & i18n `ui/src/pages/.../*.tsx`, `ui/src/utils/language.ts`	Rename label key and strings from "Last checked" → "Last seen"; update columns, sort fields, and displayed values to use lastSeen fields.
Documentation `.agents/AGENTS.md`	Document rename and new `mark_seen()` heartbeat behavior for async (pull-mode) services.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant JobsView as "Jobs View"
  participant PSModel as "ProcessingService"
  participant DB as "Database"

  Client->>JobsView: POST /jobs/{id}/tasks or /results
  JobsView->>PSModel: _mark_pipeline_pull_services_seen(job)
  PSModel->>PSModel: for each async service -> mark_seen(live=True)
  PSModel->>DB: UPDATE processing_service SET last_seen=now(), last_seen_live=TRUE
  DB-->>PSModel: OK
  PSModel-->>JobsView: done
  JobsView-->>Client: continue tasks/results flow

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Update how the status of processing services are checked & reported #1122 — Mirrors the rename of last_checked* → last_seen* and the shift in async/pull vs sync/push handling; likely addressed by these changes.

Possibly related PRs

Add retry mechanism and improved error handling for processing services #1022 — Modifies ProcessingService health/status and pipeline selection logic; closely related to the latency/live field changes and selection behavior in this PR.

Suggested reviewers

annavik

Poem

🐰 I hopped in quick to tap a beat,
Seen, not checked — a gentler heartbeat.
Pipelines wink, workers nod "I'm here",
Timestamps brighten, statuses clear.
Little rabbit prances: "All systems cheer!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 74.19% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: renaming fields and adding liveness tracking for async processing services to show their online/offline status.
Description check	✅ Passed	The PR description is comprehensive, well-structured, and follows the template with all required sections: Summary (including issue closure), detailed list of changes, test plan, and deployment/follow-up notes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-1122-rename-last-checked

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

ui/src/pages/project/processing-services/processing-services-columns.tsx (1)
54-59: Consider using the translation key for consistency.

The status details use a hardcoded 'Last seen ' string, while other parts of the codebase (e.g., processing-service-details-dialog.tsx at line 81) use translate(STRING.FIELD_LABEL_LAST_SEEN). For i18n consistency, consider using the translation function here as well.
Suggested change
     renderCell: (item: ProcessingService) => (
       <StatusTableCell
         color={item.status.color}
-        details={'Last seen ' + item.lastSeen}
+        details={translate(STRING.FIELD_LABEL_LAST_SEEN) + ' ' + item.lastSeen}
         label={item.status.label}
       />
     ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/pages/project/processing-services/processing-services-columns.tsx`
around lines 54 - 59, Replace the hardcoded "Last seen " prefix in the
StatusTableCell details with the i18n translation key usage; locate the
StatusTableCell usage where details is built (details={'Last seen ' +
item.lastSeen}) and change it to use translate(STRING.FIELD_LABEL_LAST_SEEN)
concatenated or interpolated with item.lastSeen (e.g.,
translate(STRING.FIELD_LABEL_LAST_SEEN) + item.lastSeen) so it matches other
components like processing-service-details-dialog.tsx and uses the same
STRING.FIELD_LABEL_LAST_SEEN key.
ami/ml/models/pipeline.py (1)
1072-1092: Field renames look correct, but note pre-existing edge case.

The field references are correctly updated to last_seen_live and last_seen_latency. However, there's a pre-existing edge case: if all online processing services have last_seen_latency as None, processing_service_lowest_latency will never be assigned, causing an UnboundLocalError at line 1088.

This isn't introduced by this PR, but consider addressing it as a follow-up if async/pull-mode services may not have latency values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 1072 - 1092, The loop can leave
processing_service_lowest_latency unassigned when every online
processing_service has last_seen_latency == None; after iterating
processing_services check if processing_service_lowest_latency is still None and
if so pick a fallback online service (e.g., the first processing_service where
last_seen_live is True) and set lowest_latency to None or a sentinel, then log
that latency is unknown for the chosen service using task_logger (include
pipeline_name and the selected processing_service) before returning it; update
references to processing_services, processing_service_lowest_latency,
last_seen_live, last_seen_latency, pipeline_name, and task_logger in this fix.
ami/ml/tasks.py (1)
121-123: Use logger.exception() to preserve traceback when service health checks fail.

At line 122, logger.error() logs only the exception message, losing the full stack trace. When debugging intermittent polling failures in production, the traceback is essential. Use logger.exception() instead, which is designed for exception handlers and automatically includes the full context.
♻️ Proposed refactor
        except Exception as e:
-            logger.error(f"Error checking service {service}: {e}")
+            logger.exception("Error checking service %s", service)
            continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tasks.py` around lines 121 - 123, Replace the logger.error call inside
the exception handler that catches service health-check failures so the full
traceback is preserved: in the except Exception as e: block where
logger.error(f"Error checking service {service}: {e}") is used (in
ami/ml/tasks.py, referencing logger and the service variable), call
logger.exception with a clear message (e.g., "Error checking service {service}")
so the stack trace is included, then continue as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/jobs/views.py`:
- Around line 49-52: The bulk update on
job.pipeline.processing_services.async_services().update(...) incorrectly
updates services across all projects; restrict the update to services linked to
the current job's project by adding an M2M filter on the ProcessingService
projects relation (e.g., replace async_services().update(...) with
async_services().filter(projects=job.project).update(...) or
.filter(projects__in=[job.project]).update(...) as appropriate), ensuring you
reference the existing job.pipeline and job.project objects so only services
associated with this job's project are marked last_seen/last_seen_live.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 53-54: The endpointUrl getter currently returns
this._processingService.endpoint_url using nullish coalescing, which leaves
empty string values intact; update the endpointUrl getter to normalize both null
and "" (and optionally whitespace-only strings) to undefined by reading const v
= this._processingService.endpoint_url and returning undefined when v is null or
v.trim() === "" (otherwise return v) so the UI treats empty endpoints as
async/pull-mode; modify the endpointUrl getter implementation accordingly.

---

Nitpick comments:
In `@ami/ml/models/pipeline.py`:
- Around line 1072-1092: The loop can leave processing_service_lowest_latency
unassigned when every online processing_service has last_seen_latency == None;
after iterating processing_services check if processing_service_lowest_latency
is still None and if so pick a fallback online service (e.g., the first
processing_service where last_seen_live is True) and set lowest_latency to None
or a sentinel, then log that latency is unknown for the chosen service using
task_logger (include pipeline_name and the selected processing_service) before
returning it; update references to processing_services,
processing_service_lowest_latency, last_seen_live, last_seen_latency,
pipeline_name, and task_logger in this fix.

In `@ami/ml/tasks.py`:
- Around line 121-123: Replace the logger.error call inside the exception
handler that catches service health-check failures so the full traceback is
preserved: in the except Exception as e: block where logger.error(f"Error
checking service {service}: {e}") is used (in ami/ml/tasks.py, referencing
logger and the service variable), call logger.exception with a clear message
(e.g., "Error checking service {service}") so the stack trace is included, then
continue as before.

In `@ui/src/pages/project/processing-services/processing-services-columns.tsx`:
- Around line 54-59: Replace the hardcoded "Last seen " prefix in the
StatusTableCell details with the i18n translation key usage; locate the
StatusTableCell usage where details is built (details={'Last seen ' +
item.lastSeen}) and change it to use translate(STRING.FIELD_LABEL_LAST_SEEN)
concatenated or interpolated with item.lastSeen (e.g.,
translate(STRING.FIELD_LABEL_LAST_SEEN) + item.lastSeen) so it matches other
components like processing-service-details-dialog.tsx and uses the same
STRING.FIELD_LABEL_LAST_SEEN key.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b01f01b and b5309ec.

📒 Files selected for processing (16)

.agents/AGENTS.md
.agents/DATABASE_SCHEMA.md
ami/jobs/views.py
ami/ml/migrations/0027_rename_last_checked_to_last_seen.py
ami/ml/models/pipeline.py
ami/ml/models/processing_service.py
ami/ml/serializers.py
ami/ml/tasks.py
ami/ml/tests.py
ami/ml/views.py
ui/src/data-services/models/pipeline.ts
ui/src/data-services/models/processing-service.ts
ui/src/pages/processing-service-details/processing-service-details-dialog.tsx
ui/src/pages/project/pipelines/pipelines-columns.tsx
ui/src/pages/project/processing-services/processing-services-columns.tsx
ui/src/utils/language.ts

ami/jobs/views.py

ui/src/data-services/models/processing-service.ts

Copilot

Pull request overview

Renames ProcessingService.last_checked* fields to last_seen* across backend and UI, and adds heartbeat-style liveness tracking for pull-mode (async/no-endpoint) processing services.

Changes:

Backend: rename model fields + add mark_seen() and async/sync query helpers; update polling task + endpoints to update last_seen.
Frontend: rename model accessors/columns/labels from “last checked” to “last seen”.
Data migration: add Django RenameField migration to preserve existing data.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`ami/ml/models/processing_service.py`	Renames fields, adds `mark_seen()`, and changes `get_status()` semantics for pull-mode services.
`ami/ml/tasks.py`	Updates periodic liveness checker to poll sync services + mark stale async services offline.
`ami/ml/views.py`	Marks service seen after successful async pipeline registration.
`ami/jobs/views.py`	Adds heartbeat updates on async worker task-fetch/result-submit endpoints.
`ami/ml/serializers.py`	Renames serialized fields to `last_seen`/`last_seen_live`.
`ami/ml/models/pipeline.py`	Updates queryset filters and comments to use `last_seen*`.
`ami/ml/migrations/0027_rename_last_checked_to_last_seen.py`	Data-preserving field renames via `RenameField`.
`ami/ml/tests.py`	Adds unit/integration tests for `mark_seen()` and pipeline registration.
`ui/src/data-services/models/processing-service.ts`	Renames getters to `lastSeen*` and makes `endpointUrl` nullable-safe.
`ui/src/data-services/models/pipeline.ts`	Renames computed fields to `processingServicesOnlineLastSeen`.
`ui/src/pages/project/processing-services/processing-services-columns.tsx`	Updates status details display to use `lastSeen`.
`ui/src/pages/project/pipelines/pipelines-columns.tsx`	Renames “last checked” column to “last seen”.
`ui/src/pages/processing-service-details/processing-service-details-dialog.tsx`	Updates label/value bindings to `lastSeen`.
`ui/src/utils/language.ts`	Adds `FIELD_LABEL_LAST_SEEN` string and English translation.
`.agents/DATABASE_SCHEMA.md` / `.agents/AGENTS.md`	Updates documentation references to `last_seen_live` and pull-mode heartbeat.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ami/jobs/views.py

ui/src/pages/project/processing-services/processing-services-columns.tsx

ui/src/data-services/models/pipeline.ts

ami/ml/tests.py

ami/ml/models/processing_service.py

ami/ml/tasks.py

mihow · 2026-02-27T03:10:11Z

Code review

Found 2 issues; fixed 3 (including the critical one). Responding to existing CodeRabbit/Copilot comments inline below.

Fixed in commit 671da1a:

ImportError on PROCESSING_SERVICE_LAST_SEEN_MAX — ami/ml/tasks.py:112 imports from ami.ml.models, but the constant is defined in the submodule and not re-exported from __init__.py. This would crash the check_processing_services_online Celery beat task on every run, meaning async services would never be marked offline. Fixed by importing from ami.ml.models.processing_service directly.

antenna/ami/ml/tasks.py

Lines 110 to 113 in b5309ec

    
           import datetime 
        
           from ami.ml.models import PROCESSING_SERVICE_LAST_SEEN_MAX, ProcessingService

Null last_seen producing epoch timestamp — processingServicesOnlineLastSeen in pipeline.ts pushed new Date(null).getTime() (= 0, i.e. Jan 1 1970) into the last_seen_times array without filtering, causing Math.max(...) to silently return an ancient timestamp for any pipeline with an async service that had never checked in. Fixed by filtering out null values; returns undefined when no valid timestamps exist.

antenna/ui/src/data-services/models/pipeline.ts

Lines 133 to 141 in b5309ec

    
             const last_seen_times = [] 
        
             for (const processingService of processingServices) { 
        
               last_seen_times.push(new Date(processingService.last_seen).getTime()) 
        
             } 
        
             return getFormatedDateTimeString({ 
        
               date: new Date(Math.max(...last_seen_times)), 
        
             }) 
        
           }

"Last seen undefined" in status column — (Copilot comment 2862206485) Valid. Fixed with the suggested guard.

Needs your input:

pipelines_online returns all pipelines even when service is offline — In the async branch of get_status(), pipelines_online=pipeline_names is always populated regardless of is_live. When the heartbeat expires and is_live=False, the response still reports all pipelines as online. Sync services only populate pipelines_online when actually live. Suggest pipelines_online=pipeline_names if is_live else [] for consistency.

antenna/ami/ml/models/processing_service.py

Lines 209 to 219 in b5309ec

    
               self.save(update_fields=["last_seen_live"]) 
        
           pipeline_names = list(self.pipelines.values_list("name", flat=True)) 
        
           return ProcessingServiceStatusResponse( 
        
               timestamp=self.last_seen or datetime.datetime.now(), 
        
               request_successful=is_live, 
        
               server_live=is_live, 
        
               pipelines_online=pipeline_names, 
        
               pipeline_configs=[], 
        
               endpoint_url=None, 
        
               latency=0.0, 
        
           )

Heartbeat marks all async services on a pipeline (Copilot/CodeRabbit comment 2862206476) — _mark_pipeline_pull_services_seen() does a bulk update on all async services linked to the pipeline, not just the caller. The docstring acknowledges this ("A future application-token scheme…"). This seems acceptable for now given the single-service-per-pipeline deployment model, but worth tracking.

Responding to other comments:

CodeRabbit: i18n 'Last seen ' string — Valid nitpick. translate(STRING.FIELD_LABEL_LAST_SEEN) would be consistent with the details dialog. Worth a follow-up.
Copilot: test makes real network call (2862206503) — get_status(timeout=1) will still hit retry logic in create_session. Worth mocking, but not blocking.
Copilot: server_live contract change (2862206515) — The prior behavior returned server_live=None with an error string for no-endpoint services. Now it returns server_live=False with no error. If any client checks server_live is None to detect "no endpoint configured", that would break. The UI only checks truthiness so it's fine there, but worth noting for external API consumers.

🤖 Generated with Claude Code

_{If this code review was useful, please react with 👍. Otherwise, react with 👎.}

coderabbitai

🧹 Nitpick comments (2)

ui/src/data-services/models/processing-service.ts (1)

86-88: Consider guarding against null last_seen_live.

The backend model defines last_seen_live = models.BooleanField(null=True), meaning it can be null. The return type boolean doesn't reflect this possibility. While the current status getter handles falsy values gracefully (treating null as OFFLINE), direct consumers of lastSeenLive may expect a definite boolean.
♻️ Optional: coerce to boolean
   get lastSeenLive(): boolean {
-    return this._processingService.last_seen_live
+    return this._processingService.last_seen_live ?? false
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/processing-service.ts` around lines 86 - 88, The
getter lastSeenLive currently declares a boolean return but directly returns
_processingService.last_seen_live which can be null; change the getter signature
to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ami/ml/tests.py (1)

192-204: Consider mocking the network call for deterministic tests.

This test makes a real network request to a nonexistent host, which depends on DNS resolution and socket timeouts. While it works, mocking requests or the underlying HTTP call would make this test faster and more reliable in CI environments with restricted network access.

Example mock approach

from unittest.mock import patch

def test_get_status_updates_last_seen_for_sync_service(self):
    """Test that get_status() updates last_seen fields for sync services."""
    service = ProcessingService.objects.create(
        name="Sync Service", 
        endpoint_url="http://nonexistent-host:9999"
    )
    service.projects.add(self.project)

    with patch('requests.Session.get') as mock_get:
        mock_get.side_effect = Exception("Connection refused")
        service.get_status(timeout=1)
    
    service.refresh_from_db()
    # assertions...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 192 - 204, Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tests.py`:
- Around line 192-204: Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 86-88: The getter lastSeenLive currently declares a boolean return
but directly returns _processingService.last_seen_live which can be null; change
the getter signature to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5309ec and cb368ac.

📒 Files selected for processing (7)

ami/jobs/views.py
ami/ml/serializers.py
ami/ml/tasks.py
ami/ml/tests.py
ui/src/data-services/models/pipeline.ts
ui/src/data-services/models/processing-service.ts
ui/src/pages/project/processing-services/processing-services-columns.tsx

🚧 Files skipped from review as they are similar to previous changes (3)

ui/src/pages/project/processing-services/processing-services-columns.tsx
ami/jobs/views.py
ui/src/data-services/models/pipeline.ts

mihow · 2026-02-27T08:15:16Z

🧹 Nitpick comments (2)

ui/src/data-services/models/processing-service.ts (1)> 86-88: Consider guarding against null last_seen_live.

The backend model defines last_seen_live = models.BooleanField(null=True), meaning it can be null. The return type boolean doesn't reflect this possibility. While the current status getter handles falsy values gracefully (treating null as OFFLINE), direct consumers of lastSeenLive may expect a definite boolean.

♻️ Optional: coerce to boolean
   get lastSeenLive(): boolean {
-    return this._processingService.last_seen_live
+    return this._processingService.last_seen_live ?? false
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/processing-service.ts` around lines 86 - 88, The
getter lastSeenLive currently declares a boolean return but directly returns
_processingService.last_seen_live which can be null; change the getter signature
to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ami/ml/tests.py (1)> 192-204: Consider mocking the network call for deterministic tests.

This test makes a real network request to a nonexistent host, which depends on DNS resolution and socket timeouts. While it works, mocking requests or the underlying HTTP call would make this test faster and more reliable in CI environments with restricted network access.

Example mock approach

from unittest.mock import patch

def test_get_status_updates_last_seen_for_sync_service(self):
    """Test that get_status() updates last_seen fields for sync services."""
    service = ProcessingService.objects.create(
        name="Sync Service", 
        endpoint_url="http://nonexistent-host:9999"
    )
    service.projects.add(self.project)

    with patch('requests.Session.get') as mock_get:
        mock_get.side_effect = Exception("Connection refused")
        service.get_status(timeout=1)
    
    service.refresh_from_db()
    # assertions...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 192 - 204, Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tests.py`:
- Around line 192-204: Replace the real network call in
test_get_status_updates_last_seen_for_sync_service with a deterministic mock:
patch requests.Session.get (or the exact HTTP call used by
ProcessingService.get_status) to raise an exception or return a controlled
response, then call service.get_status(timeout=1), refresh_from_db and assert
last_seen, last_seen_live and last_seen_latency; this ensures the test for
ProcessingService.get_status runs fast and reliably without performing
DNS/socket I/O.

In `@ui/src/data-services/models/processing-service.ts`:
- Around line 86-88: The getter lastSeenLive currently declares a boolean return
but directly returns _processingService.last_seen_live which can be null; change
the getter signature to return boolean | null and keep returning
this._processingService.last_seen_live (or, if you prefer to coerce, return
!!this._processingService.last_seen_live and keep boolean) so consumers and the
type system reflect the backend's nullable models.BooleanField; update any
callers if you choose the nullable variant.

ℹ️ Review info

I agree that a boolean with a default of Null is confusing. I think I chose that to represent "not checked" or "unknown". So the frontend could show blank instead of Offline. But I think False would also be a fine simplification.

coderabbitai

🧹 Nitpick comments (1)

ami/ml/tasks.py (1)
98-101: Minor: Comment arithmetic may be slightly off.

The comment states worst case is "8 + 2 + 8 + 4 + 8 = 30s" but with retries=3 and backoff_factor=2, there are 4 total attempts (initial + 3 retries). The actual sum shown equals 30s, but if there are 4 attempts at 8s each plus backoff delays (0s, 2s, 4s), worst case would be closer to 38s. This doesn't affect functionality—just a documentation nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tasks.py` around lines 98 - 101, The inline comment for
_BEAT_STATUS_TIMEOUT miscalculates the worst-case total; update the comment to
state there are 4 attempts (initial + 3 retries) so the total is 4 * 8s = 32s
plus backoff delays (0s + 2s + 4s = 6s) => ~38s; edit the comment above
_BEAT_STATUS_TIMEOUT to reflect this corrected arithmetic and briefly note the
calculation (attempts * timeout + backoff delays).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ami/ml/tasks.py`:
- Around line 98-101: The inline comment for _BEAT_STATUS_TIMEOUT miscalculates
the worst-case total; update the comment to state there are 4 attempts (initial
+ 3 retries) so the total is 4 * 8s = 32s plus backoff delays (0s + 2s + 4s =
6s) => ~38s; edit the comment above _BEAT_STATUS_TIMEOUT to reflect this
corrected arithmetic and briefly note the calculation (attempts * timeout +
backoff delays).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb368ac and a159ba3.

📒 Files selected for processing (1)

ami/ml/tasks.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/ml/models/processing_service.py`:
- Around line 219-225: The status response currently lists all registered
pipelines even when the worker is not live; change the pipelines_online passed
into ProcessingServiceStatusResponse to be pipeline_names only when is_live is
True, otherwise pass an empty list (i.e., compute pipeline_names from
self.pipelines.values_list("name", flat=True) but use it conditionally based on
is_live) so that ProcessingServiceStatusResponse(timestamp=self.last_seen or
datetime.datetime.now(), request_successful=is_live, server_live=is_live,
pipelines_online=..., pipeline_configs=[]) accurately reflects an offline
worker.

In `@ami/ml/tasks.py`:
- Around line 103-105: The offline-sweep timing is too slow relative to
PROCESSING_SERVICE_LAST_SEEN_MAX, causing last_seen_live to remain stale; either
make the sweep cadence (_BEAT_TASK_EXPIRES) no greater than
PROCESSING_SERVICE_LAST_SEEN_MAX or stop relying on a persisted last_seen_live
flag and derive liveness from the stored last_seen timestamp at read time;
update references to _BEAT_TASK_EXPIRES and any logic that sets/reads
last_seen_live (also in the block around the code at the 126-134 region) to use
the chosen approach so worker liveness reflects the heartbeat window.
- Around line 98-100: The periodic beat task iterates
ProcessingService.objects.sync_services() but assumes a fixed time_limit=150
while each unreachable sync service can cost ~38s (4 attempts × ~8s + backoff),
so with >3 services the task may exceed the hard limit and skip later services;
fix by computing the required budget from the queryset size (e.g., required_time
= num_services * per_service_worst_case + margin) and set time_limit
accordingly, or refactor to fan out work per service (enqueue a per-service sync
task) or process services in bounded chunks to ensure no single beat exceeds
time_limit; update the code references where time_limit=150 and the periodic
beat/sync loop that consumes ProcessingService.objects.sync_services() to
implement one of these solutions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2a39962-a3ef-4415-9c84-ca842d37a99c

📥 Commits

Reviewing files that changed from the base of the PR and between a159ba3 and fdfb2ec.

📒 Files selected for processing (5)

ami/ml/models/pipeline.py
ami/ml/models/processing_service.py
ami/ml/tasks.py
ui/src/data-services/models/processing-service.ts
ui/src/pages/project/processing-services/processing-services-columns.tsx

✅ Files skipped from review due to trivial changes (1)

ui/src/data-services/models/processing-service.ts

🚧 Files skipped from review as they are similar to previous changes (2)

ui/src/pages/project/processing-services/processing-services-columns.tsx
ami/ml/models/pipeline.py

ami/ml/models/processing_service.py

ami/ml/tasks.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

ui/src/data-services/models/processing-service.ts (1)

88-90: Consider preserving last_seen_live = null as “unknown” instead of coercing to false.

Current logic collapses null/undefined to OFFLINE, which can hide “not yet observed” state for endpoint-based services. If tri-state behavior is desired, keep nullability and map it to UNKNOWN in status.

♻️ Proposed refactor

-  get lastSeenLive(): boolean {
-    return this._processingService.last_seen_live ?? false
+  get lastSeenLive(): boolean | null | undefined {
+    return this._processingService.last_seen_live
   }

@@
-    const status_code = this.lastSeenLive ? 'ONLINE' : 'OFFLINE'
+    if (this.lastSeenLive == null) {
+      return ProcessingService.getStatusInfo('UNKNOWN')
+    }
+    const status_code = this.lastSeenLive ? 'ONLINE' : 'OFFLINE'

Please confirm intended product semantics for endpoint services with last_seen_live = null (offline vs unknown).

Also applies to: 105-106

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/processing-service.ts` around lines 88 - 90, The
getter lastSeenLive currently coerces this._processingService.last_seen_live to
false, losing a nullable "unknown" state; change it to preserve null/undefined
(i.e., return this._processingService.last_seen_live as boolean |
null/undefined) and update whatever status-mapping logic (the status calculation
that consumes lastSeenLive) to explicitly map null/undefined to an UNKNOWN state
rather than OFFLINE; also apply the same null-preserving change to the other
occurrences that currently coerce last_seen_live to false (the similar getters
at the other two spots noted).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ui/src/data-services/models/processing-service.ts`:
- Around line 60-62: The isAsync getter currently returns
this._processingService.is_async which may be undefined and violates the boolean
return type; update the getter (isAsync) to defensively coerce the value to a
boolean (for example using `?? false` or `Boolean(...)`) when reading
this._processingService.is_async so it always returns a boolean similar to the
defensive pattern used in lastSeenLive.

---

Nitpick comments:
In `@ui/src/data-services/models/processing-service.ts`:
- Around line 88-90: The getter lastSeenLive currently coerces
this._processingService.last_seen_live to false, losing a nullable "unknown"
state; change it to preserve null/undefined (i.e., return
this._processingService.last_seen_live as boolean | null/undefined) and update
whatever status-mapping logic (the status calculation that consumes
lastSeenLive) to explicitly map null/undefined to an UNKNOWN state rather than
OFFLINE; also apply the same null-preserving change to the other occurrences
that currently coerce last_seen_live to false (the similar getters at the other
two spots noted).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0a436ff-a7e2-48b4-abf4-d08780cdf3fb

📥 Commits

Reviewing files that changed from the base of the PR and between fdfb2ec and 857f5d3.

📒 Files selected for processing (1)

ui/src/data-services/models/processing-service.ts

ui/src/data-services/models/processing-service.ts

…1122) Rename fields to better reflect the semantic difference between sync and async processing service status tracking: - last_checked → last_seen - last_checked_live → last_seen_live - last_checked_latency → last_seen_latency For sync services with endpoint URLs, fields are updated by the periodic status checker. For async/pull-mode services, a new mark_seen() method is called when the service registers pipelines, recording that we heard from it. Also updates all references in serializers, pipeline queryset, views, frontend models, columns, dialog, and language strings. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

Add structured queryset methods and a heartbeat mechanism so async (pull-mode) processing services stay in sync with their actual liveness. ProcessingService: - New ProcessingServiceQuerySet with async_services() / sync_services() methods — single canonical filter for endpoint_url null-or-empty, used everywhere instead of ad-hoc Q expressions - is_async property (derived from endpoint_url, no DB column) - Docstrings reference Job.dispatch_mode ASYNC_API / SYNC_API for context Liveness tracking: - PROCESSING_SERVICE_LAST_SEEN_MAX = 60s constant (12× the worker's 5s poll interval) — async services are considered offline after this - check_processing_services_online task now handles both modes: sync → active /readyz poll; async → bulk mark stale via async_services() - _mark_pipeline_pull_services_seen() helper in jobs/views.py: single bulk UPDATE via job.pipeline.processing_services.async_services(), called at the top of both /jobs/{id}/tasks/ and /jobs/{id}/result/ so every worker poll cycle refreshes last_seen without needing a separate registration Async job cleanup (from carlosg/redisatomic): - Rename _cleanup_job_if_needed → cleanup_async_job_if_needed and export it so Job.cancel() can call it directly without a local import - JobLogHandler: refresh_from_db before appending to avoid last-writer- wins race across concurrent worker processes - Job.logger: update existing handler's job reference instead of always adding a new handler (process-level singleton leak fix) Co-Authored-By: Claude <noreply@anthropic.com>

- PROCESSING_SERVICE_LAST_SEEN_MAX = 60s constant (12x the worker's 5s poll interval) used by check_processing_services_online to expire stale async service heartbeats - get_status() pull-mode branch: derives server_live from staleness check, populates pipelines_online from registered pipelines, uses `not self.endpoint_url` to also handle empty-string endpoints - endpointUrl getter: returns undefined instead of stringified "null" so async services show a blank cell in the endpoint column Co-Authored-By: Claude <noreply@anthropic.com>

- Fix ImportError: import PROCESSING_SERVICE_LAST_SEEN_MAX directly from ami.ml.models.processing_service (not re-exported from ami.ml.models) - Fix null last_seen causing epoch timestamp in processingServicesOnlineLastSeen getter — filter out null values before Math.max - Fix "Last seen undefined" rendered in status column when lastSeen is undefined Co-Authored-By: Claude <noreply@anthropic.com>

Move the async service stale-check to the top of check_processing_services_online so it always runs, even if a slow sync service check hits the time limit. Reduce the per-request timeout for the beat task from 90s (designed for cold-start waits) to 8s — if a sync service is starting up it will recover on the next cycle. Raise soft_time_limit/time_limit accordingly to give the sync loop room to complete (worst case ~30s per service with retries). Co-Authored-By: Claude <noreply@anthropic.com>

Async services now derive liveness from heartbeats rather than returning an error message. Update assertions: server_live=False (not None) when no heartbeat has been received, and remove error message checks. Co-Authored-By: Claude <noreply@anthropic.com>

Filter async services by project when marking them as seen, preventing cross-project contamination when a pipeline is shared across projects. Clarify in the docstring that this still marks all async services on the pipeline within the project, not the individual caller, until application-token auth (PR #1117) is available. Co-Authored-By: Claude <noreply@anthropic.com>

Add is_async to ProcessingServiceSerializer and ProcessingServiceNestedSerializer so the frontend can distinguish pull-mode from push-mode services. Also normalize empty endpoint_url strings to undefined in the FE model for consistency with the backend. Co-Authored-By: Claude <noreply@anthropic.com>

…le copies - Run async stale-check before the sync loop so it always executes regardless of how long sync checks take - Reduce per-request timeout for the beat task from 90s (designed for cold-start waits) to 8s — a slow or unreachable service just waits for the next 5-minute cycle - Add expires=240s so copies queued during a worker outage are discarded when the worker returns; only the most recent firing runs Co-Authored-By: Claude <noreply@anthropic.com>

- Coerce `last_seen_live` null to false in the getter to match the boolean return type (backend field is nullable). - Use `translate(STRING.FIELD_LABEL_LAST_SEEN)` in the status column instead of a hardcoded English string, matching the pattern in the details dialog. Co-Authored-By: Claude <noreply@anthropic.com>

…ogging - Fix potential UnboundLocalError in select_processing_service() when all online services have last_seen_latency=None (e.g. async/pull-mode services). Falls back to the first online service and logs that no latency data exists. - Use logger.exception() instead of logger.error() in the beat task exception handler to preserve the full traceback for debugging. - Fix comment arithmetic for worst-case timeout calculation. Co-Authored-By: Claude <noreply@anthropic.com>

Async/pull-mode services (no endpoint URL) now show a gray "Unknown" status indicator instead of the misleading ONLINE/OFFLINE based on heartbeat data that applies to all services on the pipeline. The pipeline-level indicator in the dropdown continues to work as before. Co-Authored-By: Claude <noreply@anthropic.com>

Async/pull-mode services (no endpoint URL) are now treated as "online" in the pipeline selector since their status cannot be verified via active polling. This prevents the "Process Now" dropdown from incorrectly disabling async pipelines when no recent heartbeat exists. Co-Authored-By: Claude <noreply@anthropic.com>

…hecks Use the explicit is_async field from the backend serializer rather than inferring async status from the absence of endpoint_url. Co-Authored-By: Claude <noreply@anthropic.com>

- Return empty pipelines_online list when async service is offline, avoiding contradictory payload (server_live=False + non-empty pipelines). - Add ?? false to isAsync getter to match defensive pattern used by lastSeenLive. Co-Authored-By: Claude <noreply@anthropic.com>

…s list The write-only project field was declared on the serializer but not included in Meta.fields, causing an AssertionError at runtime. Missed during rebase conflict resolution. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

mihow force-pushed the issue-1122-rename-last-checked branch 2 times, most recently from 77dd024 to 2c029f8 Compare February 27, 2026 02:16

mihow marked this pull request as ready for review February 27, 2026 02:27

Copilot AI review requested due to automatic review settings February 27, 2026 02:27

Copilot started reviewing on behalf of mihow February 27, 2026 02:28 View session

mihow changed the title ~~Rename ProcessingService last_checked fields to last_seen~~ PSv2: Show online / offline status for async processing services Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

ami/jobs/views.py Outdated Show resolved Hide resolved

ui/src/data-services/models/processing-service.ts Outdated Show resolved Hide resolved

Copilot AI reviewed Feb 27, 2026

View reviewed changes

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

ami/ml/models/processing_service.py Outdated Show resolved Hide resolved

ami/ml/tasks.py Show resolved Hide resolved

ami/ml/tasks.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

ui/src/data-services/models/processing-service.ts Show resolved Hide resolved

mihow and others added 13 commits March 23, 2026 19:04

style: fix prettier formatting in pipeline.ts

8372e0a

Co-Authored-By: Claude <noreply@anthropic.com>

docs: explain get_status feature for sync vs. async services

9cd2ddb

mihow and others added 2 commits March 23, 2026 19:04

mihow force-pushed the issue-1122-rename-last-checked branch from 857f5d3 to 770de40 Compare March 24, 2026 02:09

mihow and others added 4 commits March 23, 2026 19:17

refactor(ui): use isAsync instead of !endpointUrl for async service c…

f0598ac

…hecks Use the explicit is_async field from the backend serializer rather than inferring async status from the absence of endpoint_url. Co-Authored-By: Claude <noreply@anthropic.com>

style: fix prettier formatting in processing-services-columns

203ad5d

Co-Authored-By: Claude <noreply@anthropic.com>

Conversation

mihow commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Follow-up items

Uh oh!

netlify bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec ready!

Uh oh!

netlify bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview ready!

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mihow commented Feb 27, 2026

Code review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mihow commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mihow commented Feb 21, 2026 •

edited

Loading

netlify bot commented Feb 21, 2026 •

edited

Loading

netlify bot commented Feb 21, 2026 •

edited

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading