Advancer and machine manager reliability improvements#748
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves reliability across the machine runtime, machine manager, and advancer by tightening error handling, adding bounded batching to avoid large in-memory backlogs, and making shutdown/cleanup more robust.
Changes:
- Introduces a structured
AdvanceResponsereturn type for machine advances and adds a reports-per-input limit with a corresponding completion status. - Adds batched input replay/synchronization for machine instances and surfaces a configurable
ADVANCER_INPUT_BATCH_SIZE. - Improves advancer and manager shutdown semantics (graceful HTTP shutdown, closing machine instances to avoid orphan processes) and snapshot handling safety.
Reviewed changes
Copilot reviewed 21 out of 23 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/machine/machine.go | Adds AdvanceResponse; wraps cancel/deadline errors; introduces ErrReportsLimitExceeded; updates Machine.Advance signature. |
| pkg/machine/implementation.go | Enforces report limits; replaces panics with errors; adds context checks in run loop; updates advance return shape. |
| pkg/machine/machine_test.go | Updates tests for new AdvanceResponse API and adds load-path error coverage. |
| pkg/machine/implementation_test.go | Adds coverage for outputs hash proof/checkpoint write and additional run-loop behaviors; updates advance tests to new API. |
| pkg/machine/libcartesi.go | Removes stray println on JSON unmarshal failure. |
| internal/manager/types.go | Extends interfaces: Synchronize now takes batchSize, OutputsProof signature change, and adds MachineProvider.Close. |
| internal/manager/manager.go | Adds instance factory abstraction; adds close semantics; switches synchronization to batched replay; improves snapshot fallback logic. |
| internal/manager/instance.go | Implements batched Synchronize; hardens runtime lifecycle with destroy-on-unrecoverable-error; adds close timeout draining inspects; atomic processed input tracking. |
| internal/manager/manager_test.go | Updates tests for new manager ctor/options and adds new error-path + Close aggregation tests. |
| internal/manager/instance_test.go | Adds extensive coverage for snapshot init, outputs proof/hash behaviors, synchronize batching, and close-timeout behavior. |
| internal/manager/pmutex/pmutex_test.go | Reworks contested lock tests to avoid testify’s Never helper and to ensure cleanup/unblock. |
| internal/advancer/service.go | Adds graceful HTTP server shutdown and machine manager close on Stop; propagates server failure via cancellation; wires batch size into manager creation. |
| internal/advancer/advancer.go | Fetches/processes inputs in batches; triggers shutdown on store-after-advance failure; improves snapshot path join + deletion safety checks. |
| internal/advancer/advancer_test.go | Refactors test setup; adds broad coverage for batching, epoch handling, snapshot lifecycle, and shutdown-triggering scenarios. |
| internal/config/generated.go | Adds CARTESI_ADVANCER_INPUT_BATCH_SIZE config with defaults and loaders for advancer/node. |
| internal/config/generate/Config.toml | Adds config generator entry for CARTESI_ADVANCER_INPUT_BATCH_SIZE. |
| internal/repository/postgres/schema/migrations/000001_create_initial_schema.up.sql | Extends InputCompletionStatus enum with REPORTS_LIMIT_EXCEEDED. |
| internal/repository/postgres/db/rollupsdb/public/enum/inputcompletionstatus.go | Updates generated enum mapping for REPORTS_LIMIT_EXCEEDED. |
| internal/model/models.go | Adds model enum value + scan handling for REPORTS_LIMIT_EXCEEDED. |
| internal/repository/postgres/repository_error_test.go | Minor formatting alignment change. |
| internal/inspect/inspect_test.go | Updates mock interface signatures for Synchronize/OutputsProof. |
| cmd/cartesi-rollups-cli/root/read/read.go | Minor formatting fix for flag setting. |
| Makefile | Exposes MACOSX_DEPLOYMENT_TARGET in make env output when set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
843d1e9 to
ec16956
Compare
renatomaia
previously approved these changes
Mar 5, 2026
mpolitzer
previously approved these changes
Mar 5, 2026
mpolitzer
left a comment
There was a problem hiding this comment.
added a few comments, mostly nitpicks
ec16956 to
b2733c5
Compare
…simplify test setup
b2733c5 to
e69b14d
Compare
mpolitzer
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.