Skip to content

[IN] Global register allocator#5

Merged
matgla merged 142 commits intomobfrom
otherArchRemoval
Feb 7, 2026
Merged

[IN] Global register allocator#5
matgla merged 142 commits intomobfrom
otherArchRemoval

Conversation

@matgla
Copy link
Owner

@matgla matgla commented Feb 7, 2026

and a lot of other things + benchmarks and tests

@matgla matgla requested a review from Copilot February 7, 2026 21:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly expands ARM Thumb (ARMv8-M) backend infrastructure and documentation while removing unused ARM64 stubs, aiming to support a global register allocator and related optimizations, benchmarks, and test workflows.

Changes:

  • Added ARM Thumb backend components (scratch register management, call-site tracking, ABI classification, arch/FPU configs).
  • Added extensive design/validation documentation and benchmark analysis notes for IR/codegen optimizations.
  • Added container + CI scaffolding and repo tooling metadata (Dockerfile, workflows, submodules, editor configs); removed ARM64 dummy assembler/linker code.

Reviewed changes

Copilot reviewed 48 out of 536 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
asm_token_fix_plan.md Documents migration plan for new ASM token layout in Thumb assembler
arm64-link.c Removes AArch64 linker/relocation implementation from this branch
arm64-asm.c Removes ARM64 dummy assembler stub
arm-tok.h Removes legacy ARM token definitions header
arm-thumb-scratch.c Adds scratch register allocation/save/restore helpers for Thumb codegen
arm-thumb-defs.h Introduces Thumb target defs, ABI enums, generator state structs, suffix parsing types
arm-thumb-callsite.c Adds call-site table + IR scan utilities for ABI call layout reconstruction
arch/fpu/arm/fpv5-sp-d16.h Declares FPv5-SP-D16 floating-point config
arch/fpu/arm/fpv5-sp-d16.c Defines FPv5-SP-D16 floating-point capabilities/config
arch/armv8m.c Adds ARMv8-M ArchitectureConfig defaults
arch/arm_aapcs.c Adds AAPCS-like argument classification + call layout capacity helpers
TCC_GCC_CODEGEN_COMPARISON.md Documents codegen gaps vs GCC and targeted optimizations
TAL_TOKSTR_OPTIMIZATION.md Documents plan to reduce peak memory by changing tok_str allocation strategy
REFACTORING_SUMMARY.md Summarizes “architecture independence” refactor and new machine/opt modules
POSTINC_EMBEDDED_DEREF_PLAN.md Documents plan for post-inc fusion when deref is embedded in expressions
OPTIMIZATION_VALIDATION_REPORT.md Records which optimization phases are implemented/validated
OPTIMIZATION_QUICK_WIN_1_FP_CACHE.md Documents FP-offset cache implementation and tests
OPTIMIZATION_PLAN_V2.md Updated optimization plan and analysis (LR usage, indexed loads, post-inc, etc.)
OPTIMIZATION_FP_CACHE_IMPROVEMENT_PLAN.md Documents FP cache “register-agnostic” approach and limitations
MLA_WITH_DEREFERENCES_PLAN.md Documents plan to enable MLA fusion when MUL operands are dereferences
LICM_IMPLEMENTATION_STATUS.md Status/report on LICM implementation and known bugs
LICM_IMPLEMENTATION_PLAN.md Detailed LICM design plan
LAZY_SECTION_LOADING_ALL_PLAN.md Notes reverted attempt to defer-load all sections and failure reasons
LAZY_SECTION_LOADING.md Documents lazy section loading design/implementation
FP_CACHE_IR_LEVEL_PLAN.md Proposes IR-level FP offset caching as a cleaner approach
Dockerfile Adds Ubuntu-based build/test container with ARM toolchain + QEMU
BUBBLE_SORT_COMPARISON.md Detailed bubble-sort codegen diff analysis and optimization priorities
AGENTS.md Adds project agent/guide documentation, structure, build/test notes
.vscode/tasks.json Adds QEMU run/stop tasks
.vscode/settings.json Adds terminal auto-approve configuration for tooling
.vscode/launch.json Adds debug configs (tcc under gdb, QEMU gdb attach)
.vscode/c_cpp_properties.json Adds IntelliSense settings/defines for ARM Thumb target
.gitmodules Adds new submodules (newlib, pico-sdk, mibench)
.github/workflows/docker-build.yml Adds workflow to build/push container image to GHCR
.github/workflows/ci.yml Adds CI workflow running tests inside GHCR container
.github/prompts/plan-aliasAnalysisStoreLoadOptimizations.prompt.md Adds prompt doc for alias-analysis/store-load optimization plan
.dockerignore Adds docker ignore rules for repo build artifacts and tooling dirs
.clang-format Updates formatting rules (including brace style + column limit)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


#pragma once

#define USING_GLOBALS
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining USING_GLOBALS in a header is very likely to cause multiple-definition / ODR-style linker errors because every translation unit including this header will pull in global definitions from tcc.h. Move #define USING_GLOBALS into exactly one .c file (typically the implementation file) or remove it entirely from the header and include only what’s needed (e.g., forward-declare FloatingPointConfig or include a smaller header that doesn’t depend on USING_GLOBALS).

Suggested change
#define USING_GLOBALS

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +49
if (layout->capacity < needed)
{
int new_cap = layout->capacity ? layout->capacity : 8;
while (new_cap < needed)
new_cap *= 2;
// layout->locs = (TCCAbiArgLoc *)tcc_realloc(layout->locs, sizeof(TCCAbiArgLoc) * new_cap);
/* Zero new tail for determinism. */
// memset(&layout->locs[layout->capacity], 0, sizeof(TCCAbiArgLoc) * (new_cap - layout->capacity));
layout->capacity = new_cap;
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layout->capacity is increased without reallocating the backing arrays (the tcc_realloc/memset calls are commented out). This can desynchronize capacity from actual allocated sizes and can also cause tcc_abi_call_layout_ensure_capacity() to incorrectly skip reallocation later, leading to out-of-bounds writes. Replace this block with a call to tcc_abi_call_layout_ensure_capacity(layout, needed) (or fully re-enable correct realloc/zero-init here) and keep capacity consistent with the real allocations.

Suggested change
if (layout->capacity < needed)
{
int new_cap = layout->capacity ? layout->capacity : 8;
while (new_cap < needed)
new_cap *= 2;
// layout->locs = (TCCAbiArgLoc *)tcc_realloc(layout->locs, sizeof(TCCAbiArgLoc) * new_cap);
/* Zero new tail for determinism. */
// memset(&layout->locs[layout->capacity], 0, sizeof(TCCAbiArgLoc) * (new_cap - layout->capacity));
layout->capacity = new_cap;
}
tcc_abi_call_layout_ensure_capacity(layout, needed);

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +124
for (int i = 0; i < count; ++i)
{
if (regs_to_save & (1u << result.regs[i]))
{
if (scratch_push_count < 128)
scratch_push_stack[scratch_push_count++] = result.regs[i];
else
tcc_error("compiler_error: scratch register push stack overflow (>128 pushes without restore)");
}
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you emit a single th_push(regs_to_save) with a register mask, the architectural stack layout is in register-number order, but scratch_push_stack records registers in allocation order (result.regs[]). If allocation order differs from ascending register number order, later POP validation/deferral and the “auto restore” path can pop registers in an order that doesn’t match what’s actually on the stack, restoring wrong values. A robust fix is to track pushes as masks on a push-mask stack (LIFO of uint32_t masks) instead of per-register, or record the saved registers in the exact order they are stored on the stack (ascending register number) for each mask push.

Copilot uses AI. Check for mistakes.

if (can_restore_all && check_count > 0)
{
fprintf(stderr, "[SCRATCH] Popping registers (mask=0x%x) in single instruction\n", allocs->saved_mask);
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fprintf(stderr, ...) calls are unconditional (not behind #ifdef ARM_THUMB_DEBUG_SCRATCH) and will spam stderr in normal builds/tests. Wrap them in the same debug macro used elsewhere in this file (or route through a project logging macro that can be compiled out).

Copilot uses AI. Check for mistakes.
}
else
{
fprintf(stderr, "[SCRATCH] WARNING: restore_scratch_regs out of order; deferring POP\n");
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fprintf(stderr, ...) calls are unconditional (not behind #ifdef ARM_THUMB_DEBUG_SCRATCH) and will spam stderr in normal builds/tests. Wrap them in the same debug macro used elsewhere in this file (or route through a project logging macro that can be compiled out).

Copilot uses AI. Check for mistakes.
Comment on lines 3 to 4
"make": true,
"heaptrack_print": true
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing editor settings that auto-approve terminal commands can create a footgun for contributors (commands may run without explicit confirmation). Consider removing this from the repo, scoping it to a local/user settings example, or setting these to false by default with documentation on how to opt-in locally.

Suggested change
"make": true,
"heaptrack_print": true
"make": false,
"heaptrack_print": false

Copilot uses AI. Check for mistakes.
BasedOnStyle: LLVM
ColumnLimit: 120
AllowShortFunctionsOnASingleLine: None
BreakBeforeBraces: Allman
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BreakBeforeBraces: Allman conflicts with the brace style documented in AGENTS.md (K&R style). This will cause churn and inconsistent formatting expectations. Align the documented style and .clang-format settings (either update the docs to Allman or adjust .clang-format to match the documented K&R approach).

Suggested change
BreakBeforeBraces: Allman
BreakBeforeBraces: Attach

Copilot uses AI. Check for mistakes.
@matgla matgla merged commit d1dae41 into mob Feb 7, 2026
1 check passed
@matgla matgla deleted the otherArchRemoval branch February 7, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant