Skip to content

Place filtering incorrectly filters at person level instead of household level #224

@anth-volk

Description

@anth-volk

Bug Description

The _filter_us_simulation_by_place() method in simulation.py filters the dataset at the person level instead of the household level, causing incorrect results in economy comparisons.

Current Implementation (Incorrect)

def _filter_us_simulation_by_place(self, simulation, simulation_type, region, reform):
    _, place_fips_code = parse_us_place_region(region)
    df = simulation.to_input_dataframe()  # Returns person-level data
    person_place_fips = simulation.calculate("place_fips", map_to="person").values
    mask = (person_place_fips == place_fips_code) | (person_place_fips == place_fips_code.encode())
    return simulation_type(dataset=df[mask], reform=reform)  # Filters PERSONS

Expected Behavior

Should filter at the household level, keeping all persons in matching households, as demonstrated in the subsample() method in policyengine-core:

# Correct pattern from subsample():
h_df = df.groupby(household_id_column).first()
chosen_household_ids = h_df[mask].index
subset_df = df[df[household_id_column].isin(chosen_household_ids)]

Impact

When running the Mamdani NYC income tax analysis:

  • Expected Decile 10 average: ~$-36,149 (from Jupyter notebook)
  • Actual Decile 10 average: ~$-15,889 (from app using place filtering)
  • Budgetary impact matches (~$8.87B), confirming the filtering captures the right population but calculates averages incorrectly

Root Cause

to_input_dataframe() maps all variables to person level (line 1516 in policyengine-core). When you filter this person-level dataframe directly and create a new simulation, household-level variable calculations become incorrect.

Note

The UK country filtering (country/ regions) has the same issue - it also uses map_to="person" and filters at person level.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions