Summary
The QRF model for tip income imputation uses only 4 features: employment_income, age, count_under_18, and count_under_6. Occupation and industry are the strongest predictors of who receives tips, but are not used despite being available in SIPP.
Available SIPP variables (already loaded but unused)
TJB*_OCC — Occupation codes per job (up to 7 jobs)
TJB*_IND — Industry codes per job (up to 7 jobs)
Why this matters
Tip income is highly concentrated in specific occupations (food servers, bartenders, hairdressers, etc.) and industries (NAICS 72: Accommodation and Food Services). Without occupation/industry, the model spreads tip income more diffusely across the income distribution, which:
- Understates tip concentration among low-wage service workers
- Reduces accuracy of the distributional impact of "no tax on tips"
- May assign tips to workers in non-tipped occupations
Suggested approach
- Map SIPP occupation/industry codes to CPS occupation/industry codes (or use broad categories like 2-digit NAICS)
- Add these as categorical features to the QRF model in
sipp.py
- Ensure the CPS recipient dataset has matching occupation/industry variables for prediction
Mapping challenge
CPS and SIPP use different occupation/industry classification systems, so a crosswalk may be needed. At minimum, a broad industry indicator (e.g., food services vs. other) would capture most of the signal.
Context
This is the highest-impact improvement for closing the gap between PolicyEngine's tip deduction estimate ($4.7B) and JCT's score ($10.0B for FY2026).
Summary
The QRF model for tip income imputation uses only 4 features:
employment_income,age,count_under_18, andcount_under_6. Occupation and industry are the strongest predictors of who receives tips, but are not used despite being available in SIPP.Available SIPP variables (already loaded but unused)
TJB*_OCC— Occupation codes per job (up to 7 jobs)TJB*_IND— Industry codes per job (up to 7 jobs)Why this matters
Tip income is highly concentrated in specific occupations (food servers, bartenders, hairdressers, etc.) and industries (NAICS 72: Accommodation and Food Services). Without occupation/industry, the model spreads tip income more diffusely across the income distribution, which:
Suggested approach
sipp.pyMapping challenge
CPS and SIPP use different occupation/industry classification systems, so a crosswalk may be needed. At minimum, a broad industry indicator (e.g., food services vs. other) would capture most of the signal.
Context
This is the highest-impact improvement for closing the gap between PolicyEngine's tip deduction estimate ($4.7B) and JCT's score ($10.0B for FY2026).