Skip to content

Add region structures and filtering #229

@anth-volk

Description

@anth-volk

Summary

Add a robust region concept to policyengine.py to support geographic simulation routing in the API v2 migration. This enables the API to properly route simulation requests to the correct dataset based on the requested region.

Background

Currently, region handling is scattered across the frontend (policyengine-app-v2) and API (v1). As part of the v2 migration, we need a canonical source of truth for:

  • What regions each country model supports
  • Which dataset each region uses
  • Whether a region requires filtering from a parent dataset

Proposed Design

Region Class (core/region.py)

class Region(BaseModel):
    code: str                          # "ca", "us", "CA-01", "MI-22000"
    label: str                         # "California", "United States", "California 1st", "Detroit"
    region_type: str                   # "national", "state", "congressional_district", "place"
    parent_code: str | None = None     # For hierarchy (e.g., state -> national)
    
    # Dataset routing
    dataset_name: str | None = None    # Name of dedicated dataset (if has one)
    requires_filter: bool = False      # True if must filter from parent dataset
    filter_field: str | None = None    # Field to filter on (e.g., "place_fips")
    filter_value: str | None = None    # Value to match (if different from code)
    
    @property
    def identifier(self) -> str:
        """Generated prefixed identifier for API/URL use."""
        if self.region_type == "national":
            return self.code
        return f"{self.region_type}/{self.code}"

RegionRegistry Class

class RegionRegistry(BaseModel):
    country_id: str
    regions: list[Region]
    
    def get(self, identifier: str) -> Region: ...
    def get_by_type(self, region_type: str) -> list[Region]: ...
    def get_filterable_regions(self) -> list[Region]: ...

Country-Specific Definitions (countries/{country}/regions.py)

Each country defines its supported regions:

  • US: national, 50 states + DC, 435 congressional districts, places/cities
  • UK: national, countries (England, Scotland, Wales, NI), constituencies, local authorities

Integration Points

  1. TaxBenefitModelVersion: Attach region_registry property
  2. Dataset: Add optional region field for regional datasets
  3. Filtering: Add Dataset.filter_to_region() method for places that filter from parent

Use Case: API v2 Database Seeding

The regions defined in policyengine.py become the source of truth for seeding the API v2 alpha database:

from policyengine.countries.us.regions import US_REGIONS

def seed_regions(session, model_id, registry):
    for region in registry.regions:
        session.add(RegionModel(
            tax_benefit_model_id=model_id,
            code=region.code,
            label=region.label,
            type=region.region_type,
            dataset_name=region.dataset_name,
            requires_filter=region.requires_filter,
            # ...
        ))

Tasks

  • Create core/region.py with Region and RegionRegistry classes
  • Create countries/ package structure with base classes
  • Define US regions in countries/us/regions.py
  • Define UK regions in countries/uk/regions.py
  • Attach region registry to TaxBenefitModelVersion
  • Add filter_to_region() method to Dataset class
  • Add tests for region lookup and filtering
  • Document the region system

Related

  • policyengine-api-v2-alpha migration
  • policyengine-app-v2 geography handling

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions