Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
d9ac389
test(core): implement comprehensive mathematical invariants for Galax…
squid-protocol May 11, 2026
36d2553
docs(cobol): update legacy refraction controller documentation
squid-protocol May 11, 2026
a084848
docs: restructure wiki architecture and expand technical claims
squid-protocol May 12, 2026
085dcc6
test: reorganize test suite into modular domain directories
squid-protocol May 12, 2026
8496630
refactor: update core engine modules for test hardening alignment
squid-protocol May 12, 2026
65ebcd2
test: reorganize test suite into modular domain directories
squid-protocol May 12, 2026
e88c270
test: fix CI fixture pathing and python deprecation warnings
squid-protocol May 12, 2026
f7b9a83
test: bypass CodeQL false positive on URL substring in prism tests
squid-protocol May 12, 2026
eb1ac25
Potential fix for pull request finding 'CodeQL / Module is imported w…
squid-protocol May 12, 2026
140b995
fix: resolve flake8 undefined name error for defaultdict
squid-protocol May 12, 2026
bcaf175
fix: remove undefined collections prefix from defaultdict
squid-protocol May 12, 2026
6fb0c8d
chore: resolve CodeQL static analysis warnings
squid-protocol May 12, 2026
f2cd7b3
fix: restore pytest import in test_aperture.py
squid-protocol May 12, 2026
5f9bad6
test: fix spectral auditor orphan bypass in 50/0 law test
squid-protocol May 12, 2026
25efe2c
test: optimize 50/0 law audit test with Tier 0 orphan bypass
squid-protocol May 12, 2026
829b846
Release v1.0.0: Enterprise Test Matrix and Engine Finalization
squid-protocol May 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
44 changes: 44 additions & 0 deletions docs/wiki/01-04-the-legacy-bridge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# 01-04: The Legacy Bridge (Mainframe Modernization Philosophy)

Modernizing a 40-year-old Mainframe monolith is one of the most high-risk engineering operations an enterprise can undertake. The failure rate is exceptionally high, and the root cause is almost always the same: **the tooling relies on strict compilation.**

When traditional analysis tools (backed by Abstract Syntax Trees or compiler frontends) attempt to map a legacy COBOL repository, they crash. They crash because a `.cpy` (copybook) file is missing from the local disk. They crash because the code uses an undocumented dialect quirk from 1982. They crash because the execution flow is hidden inside a Job Control Language (JCL) macro that the parser cannot execute without an IBM emulator.

GitGalaxy solves this by entirely abandoning the compiler. By utilizing **AST-free structural physics**, GitGalaxy acts as an architectural Rosetta Stone. It treats 40-year-old COBOL and JCL not as executable software, but as raw structural data.

This philosophy unlocks deterministic mainframe modernization without requiring a mainframe.

---

## 1. Escaping the Compiler Trap

Compilers demand perfection. If a single variable is undeclared, the AST generation halts, blinding the engineering team to the rest of the file.

GitGalaxy’s heuristic engine thrives on fragmented, broken, and incomplete code. Because it uses bounded optical regular expressions (The `LogicSplicer` and `LanguageLens`), it can parse COBOL-74, COBOL-85, and IBM Enterprise extensions simultaneously. It steps over syntax errors and missing dependencies to extract the structural truth of the system:
* Where does the data enter?
* What business logic mutates it?
* Where does it exit?

## 2. Core Modernization Capabilities

By stripping away the need for an emulator, GitGalaxy provides three massive capabilities for legacy extraction:

### A. Execution DAG Mapping (The Architect)
In a mainframe, COBOL programs rarely run in isolation; they are orchestrated by JCL scripts that handle file assignments and step execution. GitGalaxy parses both the COBOL `SELECT` statements and the JCL `DD` statements to build a complete **Directed Acyclic Graph (DAG)** of the system.
It mathematically calculates the exact execution order (Topological Sort) by mapping `Producer -> Consumer` file dependencies, instantly highlighting cyclic deadlocks and architectural bottlenecks.

### B. Microservice Slicing (Taint Tracking)
Legacy monoliths are tightly coupled. Extracting a single business function (e.g., "Calculate Payroll") usually requires untangling thousands of lines of unrelated state mutations.
GitGalaxy employs a **Recursive Alias Engine** that traces variable taints across `MOVE`, `ADD`, and `COMPUTE` statements. It slices the exact lines of business logic required for a microservice while mathematically ignoring mathematically dead code isolated by the *Graveyard Reaper*.

### C. Zero-Trust Java Forging
GitGalaxy does not just map the old architecture; it scaffolds the new one.
By extracting legacy `PIC` clauses, EBCDIC byte boundaries, and `COMP-3` packed decimal constraints, the engine's Forges automatically translate mainframe structures into modern equivalents. It generates strict Spring Boot `@RestController` APIs, JPA `@Entity` schemas, and Java data-decoding utilities that match the mainframe byte-for-byte—all without hallucinations.

## 3. The LLM Context Constraint

While Autonomous AI Agents and Large Language Models (LLMs) are highly capable of translating COBOL syntax to Java, they lack global context. If an LLM translates a COBOL file that implicitly relies on a missing JCL step, the resulting Java code will compile but fail in production.

GitGalaxy serves as the deterministic bridge. Before an AI agent writes a single line of Java, GitGalaxy injects a strict remediation ticket containing the exact external dependencies, required I/O boundaries, and "honesty flags" (e.g., *“This module assumes EBCDIC encoding”*).

By grounding probabilistic AI models in deterministic structural physics, GitGalaxy guarantees that modernized microservices reflect the absolute reality of the legacy monolith.
60 changes: 60 additions & 0 deletions docs/wiki/01-07-the-shbom-standard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# 01-07: The SHBOM Standard (Structural Health Bill of Materials)

> **The Illusion of the SBOM**
>
> The software industry has heavily adopted the SBOM (Software Bill of Materials). Driven by executive orders and cybersecurity mandates, enterprises are rushing to generate manifests of their open-source dependencies.
>
> But an SBOM is just an ingredient list. An SBOM tells you that a building was constructed using steel, glass, and concrete. It does not tell you that the steel is rusting, the concrete is fracturing under extreme cognitive load, or that a single load-bearing pillar represents a catastrophic single point of failure.
>
> Knowing *what* is in your software does not mean you know the *health* of your software.

GitGalaxy introduces a new enterprise standard: the **SHBOM (Structural Health Bill of Materials)**.

The SHBOM is a deterministic, point-in-time mathematical snapshot of a repository's complete architectural reality. It justifies the existence of the GitGalaxy engine by transforming subjective code quality debates into objective, auditable liability metrics.

---

## 1. What is the SHBOM?

While a standard SBOM outputs a JSON list of packages and versions, the GitGalaxy SHBOM (exported natively via the `AuditRecorder` and `SQLite RecordKeeper`) captures the physical physics and risk exposures of the entire proprietary ecosystem.

A generated SHBOM mathematically guarantees the state of:
* **Structural Liabilities:** The exact density of Technical Debt, Cognitive Load, and State Flux across the monolithic codebase.
* **Network Topology:** The precise Blast Radius (PageRank) and Choke Points (Betweenness Centrality) of every file. It identifies the "God Nodes" that, if broken, shatter the application.
* **Threat Surfaces:** The physical exposure of the system to RCE Funnels, unhandled exceptions, and obscured payloads.
* **Physical Supply Chain Verification:** Rather than just trusting `package.json`, the SHBOM physically audits the installed dependencies on disk, proving they are not spoofed, infected with high-entropy payloads, or hiding malicious execution headers.

## 2. The Enterprise Justification (The "Why")

Why does an enterprise need an AST-free structural parser running at hyper-velocity? Because architectural rot is a financial liability. The SHBOM provides the deterministic proof required for high-stakes business operations.

### A. M&A Technical Due Diligence
When a corporation acquires a software company, they are acquiring its technical debt. Traditional due diligence relies on developer interviews and subjective, high-level architecture reviews. GitGalaxy allows acquiring firms to drop the target repository into the engine and generate a SHBOM in seconds. It provides an immediate, mathematically undeniable map of the system's fragility, key-person dependencies (Silo Risk), and architectural drift, directly informing the valuation of the asset.

### B. Zero-Trust Security Compliance
Security and compliance audits (like SOC2) increasingly demand proof of secure software development lifecycles. The SHBOM provides a permanent, immutable ledger of the system's structural integrity. Because GitGalaxy parses code without executing it, security teams can audit massive, highly classified, or broken legacy codebases in fully air-gapped, zero-trust environments.

### C. Autonomous AI Readiness Assessment
As enterprises rush to deploy Autonomous AI Agents (like Devin or GitHub Copilot Workspace) to refactor code, they face a massive risk: LLMs hallucinate when context windows are overwhelmed, and they break systems when state mutation is highly coupled.
The SHBOM acts as a DevAgent Firewall. It tells engineering leadership exactly which modules are safe for an AI to modify, and which modules are "Context Window Shredders" or "Hallucination Zones" that strictly require a Human-in-the-Loop (HITL).

## 3. A Deterministic Ledger of Reality

Codebases are living organisms; they decay over time.

By integrating GitGalaxy into a CI/CD pipeline, the engine generates a continuous stream of SHBOMs. This allows architectural leadership to track the delta of structural decay. You no longer have to guess if a refactoring initiative was successful, or if a new team is introducing systemic fragility. The physics engine proves it.

The SHBOM elevates software architecture from an abstract engineering concept into a measurable, auditable, and quantifiable business asset.

<br><br>

---

### 🌌 Powered by the blAST Engine

This documentation is part of the [GitGalaxy Ecosystem](https://github.com/squid-protocol/gitgalaxy), an AST-free, LLM-free heuristic knowledge graph engine.

* 📖 **[Previous: The Structural RAG Graph](./01-06-the-structural-rag-graph.md)**
* 📖 **[Next: Autonomous AI Guardrails](./01-08-autonomous-ai-guardrails.md)**
* 🪐 **[Explore the GitHub Repository](https://github.com/squid-protocol/gitgalaxy)** for code, tools, and updates.
* 🔭 **[Visualize your own repository at GitGalaxy.io](https://gitgalaxy.io/)** using our interactive 3D WebGPU dashboard.
55 changes: 55 additions & 0 deletions docs/wiki/01-08-autonomous-ai-guardrails.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# 01-08: Autonomous AI Guardrails (The Deterministic Firewall)

> **The Inherency of Deterministic Control**
>
> The software engineering industry is aggressively adopting Large Language Models (LLMs) and Autonomous AI Agents (like Devin, Cursor, or Copilot Workspace) to write, refactor, and execute code.
>
> However, LLMs are fundamentally probabilistic. Software architecture is fundamentally deterministic. When you unleash a probabilistic agent onto a complex, deeply coupled, undocumented codebase, the result is not accelerated engineering; it is catastrophic, cascading failure. AI cannot reliably evaluate its own blast radius, and it cannot govern its own access controls.
>
> To safely adopt AI at the enterprise level, you cannot rely on more AI. You require a mathematical sandbox. GitGalaxy serves as a **Deterministic Firewall**, wrapping the repository in structural physics to protect the codebase *from* the AI, while protecting the AI from its own hallucinations.

GitGalaxy approaches AI governance across three distinct architectural vectors: regulating the AI as a developer, regulating the AI as a runtime feature, and sandboxing the AI for automated refactoring.

---

## 1. Regulating the AI Developer (The Dev Agent Firewall)

Before an autonomous agent is allowed to execute a refactoring ticket, the environment must be structurally assessed. GitGalaxy evaluates the "Token Physics" of the repository to anticipate where an LLM is statistically guaranteed to fail.

* **Context Window Shredders:** If a file has massive token mass and extreme algorithmic complexity (e.g., $O(N^3)$), feeding it to an LLM will shred the agent's context window. The agent will suffer from "forgetfulness," dropping critical logic during the rewrite.
* **The HITL Mandate (Human-In-The-Loop):** An AI does not know if a file is a load-bearing pillar. GitGalaxy cross-references the file’s PageRank (Blast Radius) against its Technical Debt. If an agent touches a highly-centralized, fragile file, the engine mandates explicit human review, preventing automated commits that could shatter the system.
* **Silent Mutation Risks:** If an agent modifies a file with high state volatility (`flux`) but zero unit test coverage, it cannot verify its own work. GitGalaxy flags these zones to prevent silent, untestable data corruption from entering production.
* **Hallucination Zones:** Files relying heavily on dynamic metaprogramming (reflection, macros) without adequate documentation cause AI to hallucinate missing methods. GitGalaxy maps these dead-zones natively.

## 2. Regulating Runtime AI (The AppSec Sensor)

Beyond development, engineers are rapidly embedding LLMs directly into application architectures. This introduces entirely new vectors of non-deterministic execution paths that traditional static analysis tools (SAST) cannot comprehend.

GitGalaxy scans the intersection of **AI Logic**, **Public Exposure**, and **Destructive Capabilities** to hunt down weaponized AI integrations:
* **The RCE Funnel:** If an LLM prompt pipeline sits adjacent to OS-level execution (`eval`, `subprocess`) and is exposed to a public API router, a simple Prompt Injection becomes a critical Remote Code Execution (RCE) vulnerability.
* **God-Mode Agents:** If an AI is granted autonomous tool-calling wired directly to database write-access—without sufficient defensive programming (try/catch blocks)—a hallucination translates directly into autonomous data deletion.
* **The Exfiltration Vector:** If an LLM has access to outbound network sockets and environment variables, prompt injection can be used to execute Server-Side Request Forgery (SSRF) and quietly exfiltrate hardcoded secrets.

## 3. The Deterministic Sandbox (Agent Task Forging)

When GitGalaxy is actively used to drive legacy modernization (such as translating COBOL to Java), it does not just hand the legacy code to the LLM and hope for the best. It restricts the AI using strict JSON Task Tickets.

Instead of flying blind, the LLM receives:
1. **Isolated Business Rules:** Only the exact, mathematically sliced logic required for the specific microservice.
2. **Explicit Dependency Graphs:** A hardcoded list of required external `CALL` statements, extracted by the DAG Architect, forcing the AI to use established interfaces rather than hallucinating new ones.
3. **Honesty Flags:** Contextual warnings injected by the parser (e.g., *"This module assumes EBCDIC encoding"*), forcing the AI to account for legacy edge cases it would otherwise ignore.

By bounding probabilistic AI models within deterministic structural physics, GitGalaxy guarantees that enterprises can leverage the velocity of LLMs without inheriting their inherent instability.

<br><br>

---

### 🌌 Powered by the blAST Engine

This documentation is part of the [GitGalaxy Ecosystem](https://github.com/squid-protocol/gitgalaxy), an AST-free, LLM-free heuristic knowledge graph engine.

* 📖 **[Previous: The SHBOM Standard](./01-07-the-shbom-standard.md)**
* 📖 **[Next: The Continuous Delta Paradigm](./01-09-the-continuous-delta-paradigm.md)**
* 🪐 **[Explore the GitHub Repository](https://github.com/squid-protocol/gitgalaxy)** for code, tools, and updates.
* 🔭 **[Visualize your own repository at GitGalaxy.io](https://gitgalaxy.io/)** using our interactive 3D WebGPU dashboard.
66 changes: 66 additions & 0 deletions docs/wiki/01-09-the-continuous-delta-paradigm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# 01-09: The Continuous Delta Paradigm (Temporal Physics & CI/CD)

> **The Flaw of "Perfect" Parsing**
>
> The cybersecurity and software engineering industries treat Abstract Syntax Trees (ASTs) as the holy grail of code analysis. In theory, an AST is perfect: it guarantees absolute semantic correctness.
>
> In practice, ASTs are perfect but rarely used.
>
> Generating a deep semantic tree for a 5-million-line polyglot monolith requires a flawless build environment, successful compilation, and often hours of compute time. Because developers will not wait 3 hours for a Pull Request pipeline to pass, AST scans are relegated to weekly, out-of-band "nightly builds." By the time the security or architecture team sees the report, the toxic code has already been merged, deployed, and depended upon.
>
> Security and architectural governance must happen in real-time, at the exact moment of the commit. To do that, you must abandon the AST and embrace the Continuous Delta Paradigm.

GitGalaxy resolves the CI/CD compute bottleneck through **AST-free structural physics**, state persistence, and lightning-fast delta monitoring. We do not re-scan the universe every time a single star moves. We only measure the delta.

---

## 1. The StateRehydrator (SQLite Persistence)

When GitGalaxy runs a full repository scan, the `RecordKeeper` writes the entire 50-dimensional physics graph to a highly normalized SQLite database (`gitgalaxy_master.db`). This becomes the immutable baseline.

When a developer opens a new Pull Request, GitGalaxy does not start from scratch. The **StateRehydrator** intercepts the pipeline. It reads the SQLite database and instantly loads the previous structural reality directly back into RAM.

Instead of scanning 100,000 files, the engine asks Git for the diff, isolates the exactly 12 files that were modified, and pushes *only* those 12 files through the Optical Pipeline. The resulting logic blocks are surgically grafted back into the global RAM state, and the entire Network Graph (PageRank, Blast Radius, Centrality) is recalculated in a fraction of a second.

This transforms a 45-minute monolithic AST scan into a 0.8-second GitGalaxy Delta Scan, making synchronous Pull Request gating a physical reality.

## 2. The Chronometer (Temporal Physics)

Architecture is not just spatial; it is temporal. A beautifully written file is a massive liability if it is modified by 14 different developers every single week.

To map this, GitGalaxy employs the **Chronometer**. It hooks directly into the repository's version control stream (`git log`) to extract the exact modification history, commit timestamps, and author entropy for every file in the ecosystem.

* **Logarithmic Churn:** The Chronometer calculates "Deep Churn" by evaluating commit volume relative to the square root of a file's age. It dynamically finds the global maximum churn in the repository and normalizes all other files logarithmically against it, mapping a 0-100% Volatility Exposure.
* **Ownership Entropy:** It calculates the Shannon Entropy of the authors. A file written entirely by one person has 0.0 entropy (High Silo Risk). A file touched by 50 people has high entropy (High Friction Risk).

## 3. The Hardware Guillotine

Parsing massive histories for a monolith with millions of commits introduces a dangerous risk: hanging subprocesses. If a `git log` command stalls, it will freeze the CI/CD runner forever, consuming pipeline minutes and blocking deployments.

GitGalaxy defends the CI/CD pipeline using the **Hardware Guillotine**.

The Chronometer enforces a strict POSIX alarm. If the Git stream (or any regex extraction) exceeds the permitted execution window, the hardware drops the guillotine. An OS-level `SIGKILL` is issued, terminating the zombie process immediately. Pipes are forcefully flushed, and file descriptors are closed to prevent RAM leaks. The pipeline logs a partial timeout but safely continues execution, guaranteeing that GitGalaxy will never deadlock a production build pipeline.

## 4. Real-Time Architectural Drift

By combining Delta Scans with Temporal Physics, GitGalaxy shifts the enterprise posture from reactive to proactive.

A CI/CD pipeline is no longer just a place to run unit tests. It becomes a deterministic architectural firewall. You can configure branch protections to automatically block a Pull Request if:
* The PR introduces an undocumented "Shadow API".
* The PR touches a file with a PageRank > 1.5 without adding tests (Silent Mutation Risk).
* The PR increases the Cognitive Load of a core orchestrator module by more than 10%.

The Continuous Delta Paradigm proves that velocity and structural integrity are not mutually exclusive.

<br><br>

---

### 🌌 Powered by the blAST Engine

This documentation is part of the [GitGalaxy Ecosystem](https://github.com/squid-protocol/gitgalaxy), an AST-free, LLM-free heuristic knowledge graph engine.

* 📖 **[Previous: Autonomous AI Guardrails](./01-08-autonomous-ai-guardrails.md)**
* 📖 **[Next: Pipeline Overview](./02-01-pipeline-overview.md)**
* 🪐 **[Explore the GitHub Repository](https://github.com/squid-protocol/gitgalaxy)** for code, tools, and updates.
* 🔭 **[Visualize your own repository at GitGalaxy.io](https://gitgalaxy.io/)** using our interactive 3D WebGPU dashboard.
Loading
Loading