Anticipatory Learning Classifier 2 with Value Consistency Prioritization implemented in Python

Repository contains implementation of agents from the family of Learning Classifier Systems such as ACS or ACS2, enhanced with techniques like Experience Replay, Hindsight Experience Replay and a new technique - Value Consistency Prioritization.

Value Consistency Prioritization is based on the paper by Claudia Russo, Daniela Barni, Ioana Zagrean and Francesca Danioni: https://www.mdpi.com/1210682

Repository Structure

This repository contains:

openai-envs - Gymnasium library with added new environments (Gymnasium)
pyalcs - LCS agents with new ACS2VCP agent (Documentation, GitHub)
pyalcs-experiments - Scripts and notebooks with experiments

Available Agents

Agent	Description
ACS2	Anticipatory Learning Classifier System 2
ACS2ER	ACS2 with Experience Replay
ACS2HER	ACS2 with Hindsight Experience Replay
ACS2VCP	ACS2 with Value Consistency Prioritization

Each agent has experiment scripts for Maze4, Maze5 and Maze7 environments in pyalcs-experiments/scripts/.

Installation

Create conda environment and install local packages:

cd pyalcs-experiments
conda env create --file environment-base.yml
conda activate pyalcs-experiments
conda env update --file environment-base.yml --prune
cd ..

pip install -e ./pyalcs
pip install -e ./openai-envs

Quick Start

Minimal example using ACS2 on a maze environment:

import gym
import gym_maze
from lcs.agents.acs2 import ACS2, Configuration

cfg = Configuration(
    classifier_length=8,
    number_of_possible_actions=8,
    epsilon=0.8,
    beta=0.05,
    gamma=0.95,
)
agent = ACS2(cfg)
maze = gym.make('Maze4-v0')

# Explore: agent learns the environment model
explore_metrics = agent.explore(maze, 500)

# Exploit: agent uses learned knowledge (no exploration)
exploit_metrics = agent.exploit(maze, 200)

# Access learned classifiers
population = agent.get_population()

Configuration

Base parameters (all agents)

Parameter	Default	Description
`classifier_length`	required	Length of condition/effect strings
`number_of_possible_actions`	required	Number of possible actions
`epsilon`	0.5	Exploration probability (random action rate)
`beta`	0.05	Learning rate
`gamma`	0.95	Discount factor
`chi`	0.8	Crossover probability
`mu`	0.3	Mutation probability
`do_ga`	False	Enable genetic generalization
`do_pee`	False	Enable Probability-Enhanced Effects
`metrics_trial_frequency`	1	Collect metrics every N trials
`user_metrics_collector_fcn`	None	Custom metrics callback `fn(agent, env) -> dict`

ACS2ER additional parameters

Parameter	Default	Description
`er_buffer_size`	10000	Replay buffer size
`er_min_samples`	1000	Min samples before replay starts
`er_samples_number`	3	Samples replayed per step

ACS2HER additional parameters

Parameter	Default	Description
`er_buffer_size`	10000	Replay buffer size
`er_samples_number`	8	Samples replayed per step
`her_goals_number`	3	HER goals to generate per transition
`her_strategy`	None	Goal selection: `'final'` or `'future'`

ACS2VCP

Uses the same configuration as ACS2HER. The ensemble size is passed to the constructor:

agent = ACS2VCPv10(cfg, ensemble_size=4)

Using Other Agents

All agents follow the same pattern. Key differences are imports, configuration parameters, and classifier_length (16 for HER/VCP because the goal state is appended to perception).

ACS2ER:

from lcs.agents.acs2er import ACS2ER, Configuration

cfg = Configuration(
    classifier_length=8,
    number_of_possible_actions=8,
    er_buffer_size=10000,
    er_min_samples=1000,
    er_samples_number=8,
    epsilon=0.8, beta=0.05, gamma=0.95,
)
agent = ACS2ER(cfg)

ACS2HER:

from lcs.agents.acs2her import ACS2HER, Configuration

cfg = Configuration(
    classifier_length=16,
    number_of_possible_actions=8,
    er_buffer_size=10000,
    er_samples_number=8,
    her_goals_number=2,
    epsilon=0.8, beta=0.05, gamma=0.95,
)
agent = ACS2HER(cfg)

ACS2VCP:

from lcs.agents.acs2vcp import ACS2VCPv10, Configuration

cfg = Configuration(
    classifier_length=16,
    number_of_possible_actions=8,
    er_buffer_size=10000,
    er_min_samples=1000,
    er_samples_number=8,
    her_goals_number=2,
    epsilon=0.8, beta=0.05, gamma=0.95,
)
agent = ACS2VCPv10(cfg, ensemble_size=4)

Running Experiments

Experiment scripts are in pyalcs-experiments/scripts/ with naming convention run_{agent}_{maze}.py.

cd pyalcs-experiments
export PYTHONPATH=$(pwd)

# Run ACS2 on Maze4
python scripts/ACS2/run_acs2_maze4.py

# Run ACS2VCP on Maze7
python scripts/ACS2VCP/run_acs2vcp_maze7.py

Each experiment runs 30 repeats with three phases:

Explore (500 trials, epsilon=0.8) - agent learns the environment
Exploit (200 trials, epsilon=0.2) - evaluation with mild exploration
Exploit (2x200 trials, epsilon=0.0) - pure greedy evaluation

Results are saved as experiment_log.json and .dill files under scripts/{AGENT}/MAZE/.

Environments

The maze environments are OpenAI Gym-compatible:

import gym
import gym_maze

maze = gym.make('Maze4-v0')  # also: Maze5-v0, Maze7-v0

Property	Value
Actions	8 discrete (N, NE, E, SE, S, SW, W, NW)
Observation	8-element perception of surrounding cells
Reward	+1000 at goal, 0 otherwise
Max steps	50 per episode

Additional environments (grids, corridors, woods, etc.) are registered in openai-envs/ - see gym_maze/__init__.py for the full list.

Metrics

agent.explore() and agent.exploit() return a list of dicts per trial:

Key	Description
`trial`	Trial number
`steps_in_trial`	Steps taken
`reward`	Reward received
`perf_time`	Wall clock time (seconds)

Custom metrics can be added via user_metrics_collector_fcn in the configuration. The experiment scripts add knowledge (% of correctly anticipated transitions) and population statistics.

Citation

If you use this library, please cite this paper:

Olgierd Unold and Jan Zemło. 2026. Value Consistency Prioritization for Accelerating 
Knowledge Discovery in Sparse Reward Anticipatory Classifier Systems. In Proceedings 
of the Genetic and Evolutionary Computation Conference Companion. ACM.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
openai-envs		openai-envs
pyalcs-experiments		pyalcs-experiments
pyalcs		pyalcs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anticipatory Learning Classifier 2 with Value Consistency Prioritization implemented in Python

Repository Structure

Available Agents

Installation

Quick Start

Configuration

Base parameters (all agents)

ACS2ER additional parameters

ACS2HER additional parameters

ACS2VCP

Using Other Agents

Running Experiments

Environments

Metrics

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anticipatory Learning Classifier 2 with Value Consistency Prioritization implemented in Python

Repository Structure

Available Agents

Installation

Quick Start

Configuration

Base parameters (all agents)

ACS2ER additional parameters

ACS2HER additional parameters

ACS2VCP

Using Other Agents

Running Experiments

Environments

Metrics

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages