-
Notifications
You must be signed in to change notification settings - Fork 8
Closed
Description
Bug Description
The _filter_us_simulation_by_place() method in simulation.py filters the dataset at the person level instead of the household level, causing incorrect results in economy comparisons.
Current Implementation (Incorrect)
def _filter_us_simulation_by_place(self, simulation, simulation_type, region, reform):
_, place_fips_code = parse_us_place_region(region)
df = simulation.to_input_dataframe() # Returns person-level data
person_place_fips = simulation.calculate("place_fips", map_to="person").values
mask = (person_place_fips == place_fips_code) | (person_place_fips == place_fips_code.encode())
return simulation_type(dataset=df[mask], reform=reform) # Filters PERSONSExpected Behavior
Should filter at the household level, keeping all persons in matching households, as demonstrated in the subsample() method in policyengine-core:
# Correct pattern from subsample():
h_df = df.groupby(household_id_column).first()
chosen_household_ids = h_df[mask].index
subset_df = df[df[household_id_column].isin(chosen_household_ids)]Impact
When running the Mamdani NYC income tax analysis:
- Expected Decile 10 average: ~$-36,149 (from Jupyter notebook)
- Actual Decile 10 average: ~$-15,889 (from app using place filtering)
- Budgetary impact matches (~$8.87B), confirming the filtering captures the right population but calculates averages incorrectly
Root Cause
to_input_dataframe() maps all variables to person level (line 1516 in policyengine-core). When you filter this person-level dataframe directly and create a new simulation, household-level variable calculations become incorrect.
Note
The UK country filtering (country/ regions) has the same issue - it also uses map_to="person" and filters at person level.
Related
- PR Add place-level (city) filtering for US impact analysis #223 introduced place filtering
- The UK constituency/local_authority approach avoids this by reweighting instead of filtering
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels