Classifies GitHub repositories by their Infrastructure-as-Code tool usage. Scans project files to detect Terraform/OpenTofu, AWS, Azure, Docker, Kubernetes, Ansible, Puppet, Vagrant, Chef, Salt, Pulumi, Google Cloud, and Bicep.
Built as part of a summer 2024 research project at the University of Oregon.
| Abbreviation | Tool |
|---|---|
| TF/OT | Terraform / OpenTofu |
| AWS | Amazon Web Services |
| AZ | Azure |
| DOCK | Docker |
| KUB | Kubernetes |
| ANS | Ansible |
| PUP | Puppet |
| VAG | Vagrant |
| CHEF | Chef |
| SALT | Salt |
| PUL | Pulumi |
| GOOG | Google Cloud |
| BICEP | Bicep |
git clone https://github.com/kashmot2/iac-classifier.git
cd iac-classifier
python main.pyEach tool has its own detection script (e.g., docker_check2.py, ansible_check.py). The main.py orchestrator runs them all against the input CSV of repositories.
main.py — Orchestrator: runs all checks
*_check.py — Individual IaC detection scripts
combine.py — Merges results across tools
opening_csv.py — CSV utilities
*.csv — Input repository dataset
- iac-data-pipeline — Companion tool that generates JSON + CSV summaries
MIT