Skip to content

Fix ActivityStub class validation to prevent replay corruption (#381)#382

Closed
rmcdaniel wants to merge 339 commits intomasterfrom
fix/activity-stub-class-validation
Closed

Fix ActivityStub class validation to prevent replay corruption (#381)#382
rmcdaniel wants to merge 339 commits intomasterfrom
fix/activity-stub-class-validation

Conversation

@rmcdaniel
Copy link
Copy Markdown
Member

Summary

Fixes #381 - ActivityStub now validates activity class during replay to prevent data corruption

Problem

replayed stored results by index only, without verifying the activity class. When workflow code was deployed with new/changed conditional branches, in-flight workflows would replay stored results from the wrong activities, silently corrupting data.

Example Scenario

  1. V1 workflow: LoadData (index 0) → TransformData (index 1) → SaveData (index 2)
  2. V2 workflow adds validation: LoadData (index 0) → ValidateData (index 1) → TransformData (index 2) → SaveData (index 3)
  3. Replay corruption: ValidateData at index 1 gets TransformData's result, TransformData at index 2 gets SaveData's result (integer 42), SaveData dispatches with corrupted data

Solution

Added class validation in after finding log by index. If the stored log's class doesn't match the expected activity (and isn't an Exception), treat it as if there's no log and dispatch the activity fresh.

Changes

  • src/ActivityStub.php: Added class validation check after
  • tests/Unit/ActivityStubTest.php: Added regression test

Testing

  • New test verifies that when a stored log has TestOtherActivity but code expects TestActivity, the wrong result isn't replayed
  • All existing tests pass (CI will verify)
  • Exception forwarding behavior is preserved (class validation excludes )

Impact

  • Prevents silent data corruption when deploying workflow code changes
  • No breaking changes - correct workflows continue to replay normally
  • Degrades gracefully - mismatched activities are dispatched fresh rather than replaying wrong data

🤖 Generated with Claude Code

durable-workflow-ops and others added 25 commits April 14, 2026 06:00
…corder

Both the resume workflow task (created after activity completion) and
the retry activity task were missing the namespace column. This caused
namespace-scoped polls to never find these tasks, silently hanging
workflows after the first activity completes.

Fixes #176

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate workflow_schedules storage onto the package as the canonical
source of truth. Migration 157 now carries the full unified schema
(ULID PK, spec/action JSON, status enum, overlap policies, buffer queue,
skip tracking); follow-up migration 158 was folded in and removed.

- WorkflowSchedule: computed accessors (workflow_type, cron_expression,
  timezone) read from spec/action; buffer and recent-actions helpers.
- ScheduleOverlapPolicy: add BufferAll case and isBuffer() helper.
- ScheduleManager: rewritten with create() single-cron convenience +
  createFromSpec() rich form, transactional trigger(), two-phase tick(),
  backfill() via computeNextFireAt, skip-reason tracking.
- ScheduleDescription: new constructor (spec, action, firesCount,
  failuresCount, nextFireAt, lastFiredAt, note).
- V2ScheduleTest: rewritten for new field names; 31/32 pass (the one
  remaining failure is the pre-existing testSkipOverlapPolicyPrevents…
  that predates this work).

Partial progress on #161; see issue for handoff. Server-side port
still outstanding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ports ~210 lines of command validation/normalization from the server's
WorkerController into Workflow\V2\Support\WorkflowCommandNormalizer, so
the package is the single source of truth for the worker-command grammar
(complete, fail, schedule_activity, start_timer, start_child_workflow,
continue_as_new, record_side_effect, record_version_marker,
upsert_search_attributes).

PayloadEnvelopeResolver moves with it — it already depended on
Workflow\Serializers and is used by every surface that accepts payloads.

Retires issue #187 (P1 tech debt).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a codec+blob envelope alongside the decoded result on query and
update command responses so non-PHP SDKs can consume the raw payload
without re-running the server's serializer.
…, facade, and tests

Adds a read-only status() method to the WorkflowTaskBridge contract that
returns task liveness, lease, and run metadata without side effects. This
aligns the workflow task bridge with the activity task bridge pattern
(ActivityTaskBridge::status()) and enables the server to delegate lease
validation to the bridge instead of querying the DB directly (TD-S029).

7 new tests, 27 assertions covering: leased task metadata, expired lease
detection, ready task without lease, task not found, activity task
rejection, run status propagation, and null attempt_count normalization.
…w (TD-S020)

The bridge path for external workers now reads parent_close_policy from
the start_child_workflow command and sets it on the WorkflowLink row and
ChildWorkflowScheduled history event, matching the executor path. The
normalizeStartChildWorkflowCommand() method now preserves the field
through command parsing. Defaults to abandon when absent.

2 new tests: explicit terminate policy threading and default abandon
policy when omitted.
…elegation (#189)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Updated normalizeContinueAsNewCommand() to capture the 'queue' field from
the command array, and updated applyContinueAsNew() to use the command queue
when present, falling back to the original run's queue.

Before: continue_as_new always used the original run's queue, silently
dropping the worker's queue routing intent

After: continue_as_new respects the worker's queue field when provided

Part of multi-repo fix for #178 (workflow, server validation already correct,
sdk-python updated separately)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Expose documented long-poll wake signal interface in workflow package
to enable cross-node coordination in multi-server deployments.

New contract: Workflow\V2\Contracts\LongPollWakeStore
- snapshot(array $channels): array - capture current channel versions
- changed(array $snapshot): bool - detect version changes
- signal(string ...$channels): void - signal work availability
- workflowTaskPollChannels/activityTaskPollChannels/historyRunChannel helpers

New implementation: Workflow\V2\Support\CacheLongPollWakeStore
- Cache-backed default implementation
- Works with any Laravel cache driver (Redis, database, Memcached, file)
- Multi-node safe when using shared cache backends

Contributes to resolving TD-S002.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes #381

ActivityStub::make() now validates that stored log entries match the
expected activity class before replaying results. When workflow code
changes introduce new conditional branches, the index sequence can shift,
causing wrong results to be replayed from different activities.

Changes:
- Added class validation in ActivityStub::make() after finding log by index
- If stored class doesn't match expected activity (and isn't an Exception),
  treat as no log and dispatch fresh activity
- Added regression test for class mismatch scenario

This prevents silent data corruption when deploying workflow code changes
that alter the activity sequence.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@rmcdaniel
Copy link
Copy Markdown
Member Author

Closing - created in wrong location. Should not be making PRs to master.

@rmcdaniel rmcdaniel closed this Apr 15, 2026
@rmcdaniel rmcdaniel deleted the fix/activity-stub-class-validation branch April 15, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ActivityStub replays by index only branching changes corrupt in-flight workflows

2 participants