Pipelens (W.I.P)

Piplens is a project focused on identifying and solving real-world issues in modern data pipelines across enterprise platforms like Snowflake, Databricks, and BigQuery.

Overview

This project builds an agentic AI assistant that analyzes SQL and pipeline logic to detect production-grade issues before they become incidents. It also provides an estimated cost comparison before and after optimization so teams can quantify expected savings from the recommended changes.

Current Scope

Platform focus: enterprise data platforms
Current input types by object family:
- SQL-based objects: stored procedures, ad-hoc SQL queries, views, materialized views, UDFs, and DDL scripts
- Python-based objects: Python stored procedures, Python UDF handlers
- Spark-based objects: Spark SQL statements, PySpark DataFrame transformation pipelines, and Spark job/notebook code
Analysis dimensions: performance, security, and style
Output: structured findings plus rewritten code with explanations

Future State

The same review workflow is intended to expand to additional analytics engineering artifacts.

Notebooks
dbt models
SQLMesh models
Orcehstration models

Proposed Findings

The application returns a ranked FindingsReport with structured review output. Each finding is designed to include:

Severity rating
Plain-language description of the issue
Original offending snippet
Suggested rewritten code
Line-by-line explanation of the change

The goal is to give data engineers and analytics engineers the same depth of review they would get from a senior peer, delivered automatically and consistently inside their existing data platform workflow.

Agentic AI High-Level Flow

The application uses a LangGraph StateGraph with specialized nodes connected through a shared AgentState.

[User Input: object name or SQL text]
	      |
	      v
	   router
	   |     |
	(SP path) (query path)
	   |     |
	   v     v
	schema_fetcher
	      |
	      v
	+-------+--------+--------+
	|                |        |
	v                v        v
performance_   security_  style_
analyzer       auditor    reviewer
	|                |        |
	+-------+--------+--------+
	      |
	      v
	  synthesizer
	      |
	      v
	[FindingsReport]

How It Works

The router classifies the input and decides whether the flow should use the stored procedure path or the ad-hoc query path.
The schema_fetcher retrieves SQL text and execution context from the connected platform using read-only retrieval tools.
The performance_analyzer, security_auditor, and style_reviewer run in parallel.
The synthesizer merges, ranks, and formats the final findings.

Architecture Choices

LangGraph: stateful orchestration, conditional routing, and parallel fan-out
Google Gemini: strong code understanding with structured JSON output support
Platform connectors: retrieval of DDL, query history, execution plans, and related metadata from connected enterprise data platforms
Pydantic: output validation through the FindingsReport model
Jupyter notebooks: required development format for the course project

Memory Strategy

The design uses two memory layers:

Short-term memory via LangGraph MemorySaver for follow-up questions in the same session
Long-term memory via a local JSON history file so prior analyses can inform later runs

This allows the system to surface regressions and improvements when the same object is reviewed multiple times.

Tool Use

The schema_fetcher node relies on two read-only retrieval tools:

A source retrieval tool to fetch stored procedure DDL or recent ad-hoc query text
An execution-plan retrieval tool to fetch a logical execution plan for SQL statements

Both tools run through read-only access patterns on the connected platform.

Safeguards

The current design includes multiple guardrails:

Read-only platform access to block DML and DDL execution
Input validation for object names
Output blocking for unsafe modification statements
Pydantic validation for agent responses
Secret detection before prompts are sent to the model

Setup

1. Activate virtual environment

Windows:

.venv\Scripts\Activate.ps1

macOS/Linux:

source venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

Repository Files

AGENTS.md
PROJECT_DESCRIPTION.ipynb
README.md
requirements.txt

Status

Piplens is still in active development. The notebook project description is the current design reference, and the implementation will evolve as the capstone is built and tested.

Author

Jayaprakash Sivanandam

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
notebooks		notebooks
results		results
skills		skills
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
PROJECT_DESCRIPTION.ipynb		PROJECT_DESCRIPTION.ipynb
README.md		README.md
description.ipynb		description.ipynb
mcp_server.py		mcp_server.py
progress-2026-05-02.html		progress-2026-05-02.html
progress-2026-05-02.md		progress-2026-05-02.md
prompts.py		prompts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pipelens (W.I.P)

Overview

Current Scope

Future State

Proposed Findings

Agentic AI High-Level Flow

How It Works

Architecture Choices

Memory Strategy

Tool Use

Safeguards

Setup

1. Activate virtual environment

2. Install dependencies

Repository Files

Status

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pipelens (W.I.P)

Overview

Current Scope

Future State

Proposed Findings

Agentic AI High-Level Flow

How It Works

Architecture Choices

Memory Strategy

Tool Use

Safeguards

Setup

1. Activate virtual environment

2. Install dependencies

Repository Files

Status

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages