Skip to content

vyagh/tracktype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrackType

Classifies music into 10 genres using audio features extracted with librosa and an XGBoost classifier trained on the GTZAN dataset.

Supported Genres: Blues, Classical, Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae, Rock

Quick Start

git clone https://github.com/<your-username>/tracktype.git
cd tracktype
python3 -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt

The pre-trained model is included in the repo, so you can start using it right away:

Web app (upload any song and see the prediction):

streamlit run app/app.py

CLI (predict from the terminal):

python -m src.predict path/to/song.mp3
========================================
  Predicted Genre : jazz
  Confidence      : 87.3%
========================================

Top predictions:
  1. jazz          87.3%
  2. blues         6.2%
  3. classical     3.1%

Retrain the Model

If you want to reproduce the training or experiment with the model yourself:

# download the GTZAN dataset from Kaggle (requires a free Kaggle account)
python -m src.download_data

# train the model (~91% accuracy, saves artifacts to models/)
python -m src.train

The download script also places one sample audio file per genre into data/sample_audio/ for quick testing.

Project Structure

tracktype/
├── app/
│   └── app.py                 # Streamlit web interface
├── data/                      # Dataset CSVs and sample audio (auto-downloaded)
├── models/                    # Pre-trained model, scaler, label encoder
├── src/
│   ├── config.py              # Paths, feature names, constants
│   ├── download_data.py       # Downloads GTZAN dataset from Kaggle
│   ├── feature_extractor.py   # Extracts audio features with librosa
│   ├── model.py               # Loads model and runs inference
│   ├── predict.py             # CLI prediction tool
│   └── train.py               # Training script
├── requirements.txt
└── README.md

How It Works

The audio file gets split into 3-second segments to match the training data. For each segment, librosa extracts 57 features (chroma, spectral centroid, bandwidth, rolloff, zero-crossing rate, MFCCs, etc.). These features are scaled with MinMaxScaler and fed to an XGBoost classifier. For files with multiple segments, the final genre is decided by majority vote across all segments.

The model was trained on the GTZAN features_3_sec.csv with a 70/30 split and reaches ~91% accuracy.

Design Decisions

The GTZAN dataset comes with two CSVs: features_30_sec.csv (~1,000 rows, one per full clip) and features_3_sec.csv (~10,000 rows, each clip split into 3-second chunks). I initially trained on the 30-sec version and got ~71% accuracy. Switching to the 3-sec version bumped it to ~91% simply because the model gets 10x more training examples from the same data.

That choice created a mismatch problem at inference time though. Users upload full-length songs, not 3-second clips. If you extract features from a whole song and feed that to a model trained on 3-second windows, the feature distributions don't match and predictions are unreliable. So the prediction pipeline splits the input audio into the same 3-second segments, classifies each one independently, and uses majority voting to pick the final genre. This keeps the inference distribution aligned with what the model saw during training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages