MLOps Batch Job — Rolling Mean Signal Generator

A minimal MLOps-style batch pipeline that demonstrates reproducibility, observability, and deployment readiness for OHLCV trading-signal pipelines.

Project Structure

.
├── run.py            # Main pipeline script
├── config.yaml       # Job configuration (seed, window, version)
├── data.csv          # 10,000-row OHLCV dataset
├── requirements.txt  # Python dependencies
├── Dockerfile        # Container definition
├── metrics.json      # Sample output from a successful run
├── run.log           # Sample log from a successful run
└── README.md         # This file

Local Run

Prerequisites

Python 3.9+
pip

Install dependencies

pip install -r requirements.txt

Run the pipeline

python run.py \
  --input    data.csv \
  --config   config.yaml \
  --output   metrics.json \
  --log-file run.log

After a successful run, metrics.json and run.log are written to the current directory and the final metrics JSON is also printed to stdout.

Docker Build & Run

Build

docker build -t mlops-task .

Run

docker run --rm mlops-task

The container includes data.csv and config.yaml, runs the pipeline, writes metrics.json and run.log inside the container, and prints the final metrics JSON to stdout.

To retrieve output files from the container, mount a host directory:

docker run --rm -v "$(pwd)/output:/app/output" \
  mlops-task \
  python run.py --input data.csv --config config.yaml \
                --output output/metrics.json --log-file output/run.log

Exit codes: 0 = success, non-zero = failure.

Configuration (`config.yaml`)

Key	Type	Description
seed	int	NumPy random seed for determinism
window	int	Rolling mean window size (rows)
version	string	Pipeline version tag

seed: 42
window: 5
version: "v1"

Pipeline Logic

Load & validate config — parse YAML, assert required keys and types, set numpy.random.seed.
Load & validate dataset — check file exists, CSV is parseable, not empty, close column present and numeric.
Rolling mean — close.rolling(window=window, min_periods=window).mean(). The first window-1 rows produce NaN and are excluded from signal computation.
Signal — signal = 1 if close > rolling_mean, else 0. Rows with NaN rolling mean are excluded.
Metrics — rows_processed (valid signal rows), signal_rate (mean of signal), latency_ms.

Example `metrics.json`

{
  "version": "v1",
  "rows_processed": 9996,
  "metric": "signal_rate",
  "value": 0.4973,
  "latency_ms": 47,
  "seed": 42,
  "status": "success"
}

Note: rows_processed is 9996 (not 10000) because the first 4 rows have no valid rolling mean with window=5 and are excluded from signal computation. This is deterministic and reproducible.

Error Handling

All validation errors produce an error-format metrics.json and a non-zero exit code:

{
  "version": "v1",
  "status": "error",
  "error_message": "Required column 'close' not found. Available columns: [...]"
}

Handled error cases:

Missing input file
Invalid / malformed CSV
Empty CSV
Missing close column
Invalid config structure or missing required fields

Reproducibility

Running the pipeline multiple times with the same config.yaml and data.csv always produces identical metrics.json values. The seed field in config controls numpy.random.seed ensuring deterministic output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps Batch Job — Rolling Mean Signal Generator

Project Structure

Local Run

Prerequisites

Install dependencies

Run the pipeline

Docker Build & Run

Build

Run

Configuration (`config.yaml`)

Pipeline Logic

Example `metrics.json`

Error Handling

Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Dockerfile		Dockerfile
README.md		README.md
config.yaml		config.yaml
data.csv		data.csv
metrics.json		metrics.json
requirements.txt		requirements.txt
run.log		run.log
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

MLOps Batch Job — Rolling Mean Signal Generator

Project Structure

Local Run

Prerequisites

Install dependencies

Run the pipeline

Docker Build & Run

Build

Run

Configuration (config.yaml)

Pipeline Logic

Example metrics.json

Error Handling

Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`config.yaml`)

Example `metrics.json`

Packages