This repository contains starter code for practicing feature engineering with the Ames Housing dataset. You'll create new predictive features, train models, and measure the impact of feature engineering on model performance.
├── data/
│ └── train.csv # Ames Housing dataset (download required)
├── starter_code.ipynb # Main working file - complete this
├── README.md # This file
pip install pandas numpy matplotlib seaborn scikit-learn jupyterThe Ames Housing Dataset has been included in the /data directory.
Explore data_description.txt to understand the shape of the data that is in the file. Additionally, there is a train.csv file with training data and a test.csv file with testing data.
Using Jupyter Notebook:
jupyter notebook starter_code.ipynbUsing VS Code:
Open starter_code.ipynb in VS Code with the Jupyter extension installed.
- Baseline Random Forest model using raw features
- 5+ engineered features based on real estate domain knowledge
- Comparison showing how feature engineering improves predictions
- Analysis of which features provide the most value
Once complete, push your work to GitHub:
git add .
git commit -m 'completed feature engineering assignment'
git pushSubmit your GitHub repository link on the course platform.
Ames Housing Dataset:
- 1,460 houses with 79 features
- Target variable:
SalePrice(home sale price) - Features include: square footage, quality ratings, age, location, amenities, and more
- Full documentation: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data