Skip to content

KynaHui/DecisionTree_SVM_ensembleLearning

Repository files navigation

Machine Learning Algorithms

(KTH Coursework)

This repository contains implementations of fundamental machine learning algorithms developed from scratch. The projects were completed as part of the Machine Learning coursework at KTH Royal Institute of Technology and are divided into three core modules: Decision Trees, Support Vector Machines (SVM), and Ensemble Learning.

Repository Structure

Lab 1: Decision Tree Learning and Pruning

This module focuses on rule-based classification using the MONK's problem datasets.

  • Concepts Implemented: Entropy calculations, Information Gain, recursive tree building, and reduced error pruning.
  • Analysis: Evaluated the trade-off between tree complexity and predictive accuracy, specifically analyzing how pruning mitigates overfitting on the noisy MONK-3 dataset compared to the clean MONK-1 dataset.
  • Visualizations:
    • MONK-1 Mean Accuracy: MONK-1 Mean
    • MONK-1 Variance: MONK-1 Variance
    • MONK-3 Mean Accuracy: MONK-3 Mean
    • MONK-3 Variance: MONK-3 Variance

Lab 2: Support Vector Machines (SVM)

This module implements a Support Vector Machine classifier by solving the dual quadratic optimization problem.

  • Concepts Implemented: Maximum margin classification, slack variables (C-parameter) for soft margins, and the kernel trick.
  • Kernels Explored: Linear, Polynomial, and Radial Basis Function (RBF).
  • Analysis: Investigated the impact of different kernel functions and regularization parameters on the decision boundary and overall model generalization.
  • Slack poly 10e5
  • Sigma 1
  • Radial basis function (RBF) kernels
  • Polynomial kernels

Lab 3: Naive Bayes and Ensemble Learning

This module explores probabilistic classification and the AdaBoost ensemble method.

  • Concepts Implemented: Naive Bayes classification and the AdaBoost meta-algorithm.
  • Analysis: Compared the performance of a standalone Naive Bayes classifier against an AdaBoost ensemble. Utilized custom helper functions (labfuns.py) to plot and analyze decision boundaries across different datasets.

Tech Stack

  • Python 3.x
  • NumPy
  • SciPy (for SVM quadratic programming optimization)
  • Matplotlib

About

DD2421 Machine Learning, KTH

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors