Skip to content

Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #201

@gowthamrao

Description

@gowthamrao

Title: Refactor: Delegate GRPO and Evasion capabilities to OSS Alternatives (OpenRLHF, browserforge) #187

Body:

Background

In alignment with our "Borrow, Do Not Build" engineering doctrine, we have identified two major sub-systems in the CoReason platform that feature custom proprietary math and logic which can be natively handled by maintained OSS libraries:

  1. GRPO & PRMs: We currently track RLHF/GRPO advantage scores and Process Reward Model evaluations manually via EpistemicRewardGradientPolicy, CognitiveRewardEvaluationReceipt, and ProcessRewardContract. Calculating PPO/GRPO policy gradients and managing KL-divergence penalties internally across distributed GPUs is unstable at scale.
  2. Browser Evasion: We built AdversarialEmulationProfile, KinematicNoiseProfile, and EnvironmentalSpoofingProfile for 1/f pink noise mouse movements, JA3 TLS fingerprint spoofing, and WebGL canvas hashing. The cat-and-mouse game of evading CDNs requires daily updates, which makes an internal implementation fragile.

Proposed Solution

Rip out the internal logic and delegate to OSS primitives:

  • OpenRLHF / HuggingFace TRL: We will use coreason-manifest to label data (creating EpistemicGroundedTaskManifest), but offload actual backpropagation and step-level PRM verification to OpenRLHF.
  • browserforge / curl-impersonate: Delete custom TLS and Canvas spoofing math and delegate browser instantiation to maintained OSS libraries. This keeps CoReason logic strictly focused on deterministic navigation/clicking, not environment spoofing.

Tasks

  • Remove EpistemicRewardGradientPolicy, CognitiveRewardEvaluationReceipt, and ProcessRewardContract from coreason-manifest.
  • Remove AdversarialEmulationProfile, KinematicNoiseProfile, and EnvironmentalSpoofingProfile from coreason-manifest.
  • Run universal_ontology_compiler.py to regenerate JSON and Language bindings.
  • Rip out active inference GRPO evaluation logic from coreason-runtime.
  • Validate runtime and CI/CD tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions