Skip to content

JohT/code-graph-analysis-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2,591 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Code Graph Analysis Pipeline

This repository provides an automated code graph analysis pipeline built on jQAssistant and Neo4j. It supports Java and experimental TypeScript analysis, capturing both the structure and evolution of your code base.

Ever wondered which libraries matter most, how your modules build on each other, which parts have few contributors, which files change together, or where structural anomalies emerge?

This project helps uncover such patterns through graph-based analysis, visualization, and machine learning β€” offering hundreds of expert-level reports for deep code insights.

Curious? Explore the examples at code-graph-analysis-examples and get started with GETTING_STARTED.md πŸš€


✨ Features

  • Analyze static code structure as a graph
  • Supports Java Code Analysis
  • Supports Typescript Code Analysis (experimental)
  • Fully automated pipeline for Java from tool installation to report generation
  • Fully automated pipeline for Typescript from tool installation to report generation
  • Fully automated local run
  • Easily integrable into your continuous integration pipeline
  • More than 200 CSV reports for dependencies, metrics, cycles, annotations, algorithms and many more
  • Python generated charts for dependencies, metrics, visibility and many more
  • Markdown summary reports for anomalies, archetypes, git history and many more
  • Anomaly detection powered by unsupervised machine learning and explainable AI
  • Graph structure visualization
  • Automated reference document generation
  • Runtime and library independent automation using shell scripts
  • Tested on MacOS (zsh), Linux (bash) and Windows (Git Bash)
  • Comprehensive list of Cypher queries
  • Example analysis for AxonFramework
  • Example analysis for react-router

πŸ“° News

  • November 2025: Removed deprecated (since version 2.x) "graph-visualization" node package
  • November 2025: Treemap charts for anomalies and archetypes
  • October 2025: Graph visualizations for anomaly archetypes
  • October 2025: Anomaly archetypes with markdown summary
  • August 2025: Association rule mining for co-changing files in git history
  • August 2025: Anomaly detection powered by unsupervised machine learning and explainable AI
  • May 2025: Migrated to Neo4j 2025.x and Java 21.

πŸ““ Python Reports

Here is an overview of Python and Markdown reports from code-graph-analysis-examples.

πŸ“˜ Graph Data Science Reports

This project includes several reports that use Neo4j's Graph Data Science Library. These reports are part of the code-graph-analysis-examples repository. For a full list of reports, check out the CSV Cypher Query Report Reference.

Here are some reports that utilize Neo4j's Graph Data Science Library from code-graph-analysis-examples. For a complete list, see the CSV Cypher Query Report Reference.

🎨 Graph Visualization

Here are some fully automated graph visualizations utilizing GraphVizfrom code-graph-analysis-examples:

πŸ“– Blog Articles

πŸ“£ Talks

πŸ› οΈ Prerequisites

Run scripts/checkCompatibility.sh to check if all required dependencies are installed and available in your environment.

Additional Prerequisites for Python

  • Python is required for Python reports.
  • Either Conda or Python's build-in module venv a required as environment manager.
  • For Conda, use for example Miniconda or Anaconda(Recommended for Windows).
  • To use venv, no additional installation is needed. For that the environment variable USE_VIRTUAL_PYTHON_ENVIRONMENT_VENV needs to be set to 'true'.

Additional Prerequisites for Windows

  • Add this line to your ~/.bashrc file if you are using Anaconda3: /c/ProgramData/Anaconda3/etc/profile.d/conda.sh. Try to find a similar script for other conda package managers or versions.
  • Run conda init in the git bash opened as administrator. Running it in normal mode usually leads to an error message.

Additional Prerequisites for analyzing Typescript

  • Please follow the description on how to create a json file with the static code information of your Typescript project here: https://github.com/jqassistant-plugin/jqassistant-typescript-plugin
    This could be as simple as running the following command in your Typescript project:

    npx --yes @jqassistant/ts-lce
  • The cloned repository or source project needs to be copied into the directory called source within the analysis workspace, so that it will also be picked up during scan by resetAndScan.sh and optional importGit.sh.

πŸš€ Getting Started

See GETTING_STARTED.md on how to get started on your local machine.

πŸš€ Integration

See INTEGRATION.md on how to integrate code analysis in your continuous integration pipeline. Currently (2025), only GitHub Actions are supported.

πŸ—οΈ Pipeline and Tools

The Code Structure Analysis Pipeline utilizes GitHub Actions to automate the whole analysis process:

Big shout-out πŸ“£ to all the creators and contributors of these great libraries πŸ‘. Projects like this wouldn't be possible without them. Feel free to create an issue if something is missing or wrong in the list.

πŸƒ Command Reference

COMMANDS.md contains further details on commands and how to do a manual setup.

πŸ“ƒ CSV Cypher Query Report Reference

CSV_REPORTS.md lists all CSV Cypher query result reports inside the results directory. It can be generated as described in Generate CSV Report Reference.

πŸ“· Image Reference

IMAGES.md lists all PNG images inside the results directory. It can be generated as described in Generate Image Reference.

βš™οΈ Script Reference

SCRIPTS.md lists all shell scripts of this repository including their first comment line as a description. It can be generated as described in Generate Script Reference.

πŸ” Cypher Query Reference

CYPHER.md lists all Cypher queries of this repository including their first comment line as a description. It can be generated as described in Generate Cypher Reference.

Cypher is Neo4j’s graph query language that lets you retrieve data from the graph.

🌐 Environment Variable Reference

ENVIRONMENT_VARIABLES.md contains all environment variables that are supported by the scripts including default values and description. It can be generated as described in Generate Environment Variable Reference.

πŸ“• Change Log

CHANGELOG.md contains all changes of this repository.

πŸ€” Questions & Answers

  • How can i run an analysis locally?
    πŸ‘‰ Check the prerequisites. πŸ‘‰ See Start an analysis in the Commands Reference. πŸ‘‰ To get started from scratch see GETTING_STARTED.md.

  • How can i explore the Graph manually? πŸ‘‰ After analysis start Neo4j and open the Neo4j Web UI (http://localhost:7474/browser).

  • How can i add a CSV report to the pipeline?
    πŸ‘‰ Put your new cypher query into the cypher directory or a suitable (new) sub directory.
    πŸ‘‰ Create a new CSV report script in a domain directory under domains or in scripts/reports. Take for example overviewCsv.sh as a reference.
    πŸ‘‰ The script will automatically be included because of the directory and its name ending with "Csv.sh".

  • How can i analyze a different code basis automatically?
    πŸ‘‰ Create a new download script like the ones in the scripts/downloader directory. Take for example downloadAxonFramework.sh as a reference for Java projects and downloadReactRouter.sh as a reference for Typescript projects. πŸ‘‰ After downloading, run analyze.sh. You can find these steps also in the pipeline as a reference.

  • How can i trigger a full re-scan of all artifacts?
    πŸ‘‰ Delete the file artifactsChangeDetectionHash.txt in the artifacts directory. πŸ‘‰ Delete the file typescriptFileChangeDetectionHashFile.txt in the source directory to additionally re-scan Typescript projects.

  • How can I disable git log data import?
    πŸ‘‰ Set environment variable IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT to none. Example:

    export IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none"

    πŸ‘‰ Alternatively prepend your command with IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none":

    IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" ./../../scripts/analysis/analyze.sh

    πŸ‘‰ An in-between option would be to only import monthly aggregated changes using IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated":

    IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated" ./../../scripts/analysis/analyze.sh
  • What changed in version 4 regarding report generation?
    πŸ‘‰ Jupyter notebook execution, PDF generation (ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION) and --report Jupyter have been removed.
    πŸ‘‰ Use --report All to generate every report (recommended). --report Markdown produces Markdown summaries but is not always a drop-in replacement for the removed Jupyter pipeline on a fresh workspace: some Markdown summaries depend on prior CSV or Python outputs (for example, domains/overview/summary/overviewSummary.sh and domains/external-dependencies/summary/externalDependenciesSummary.sh). To get complete Markdown reports on a fresh workspace either run --report Csv or --report Python for the affected domains first, or use --report All.
    πŸ‘‰ The 25 explore/*.ipynb notebooks in domains/*/explore/ remain available for interactive exploration but are no longer executed automatically.
    πŸ‘‰ nbconvert is no longer required for automatic report generation and can be uninstalled. If you still want to open the explore/*.ipynb notebooks interactively you may still keep (or install) jupyter separately.

  • How can I increase the heap memory when scanning large Typescript projects?
    πŸ‘‰ Use the environment variable TYPESCRIPT_SCAN_HEAP_MEMORY in megabyte (default = 4096):

    TYPESCRIPT_SCAN_HEAP_MEMORY=16384 ./../../scripts/analysis/analyze.sh
  • How can I continue on errors when scanning Typescript projects instead of cancelling the whole analysis?
    πŸ‘‰ Use the profile Neo4j-latest-continue-on-scan-errors (default = Neo4j-latest):

    ./../../scripts/analysis/analyze.sh --profile Neo4j-latest-continue-on-scan-errors
  • How can I reduce the memory (RAM) consumption?
    πŸ‘‰ Use the profile Neo4j-latest-low-memory (default = Neo4j-latest):

    ./../../scripts/analysis/analyze.sh --profile Neo4j-latest-low-memory
  • How can I increase the memory (RAM) consumption?
    πŸ‘‰ Use the profile Neo4j-latest-high-memory (default = Neo4j-latest):

    ./../../scripts/analysis/analyze.sh --profile Neo4j-latest-high-memory
  • How can i increase the memory (RAM) consumption afterwards, when the setup is already done?
    πŸ‘‰ Simply run useNeo4jHighMemoryProfile.sh in your analysis working directory, or:

    ./../../domains/neo4j-management/useNeo4jHighMemoryProfile.sh

πŸ•Έ Web References