Skip to content

feat: add Large File Support (LFS) subsystem#134

Open
kamir wants to merge 7 commits intoKafScale:mainfrom
kamir:feat/lfs-core-clean
Open

feat: add Large File Support (LFS) subsystem#134
kamir wants to merge 7 commits intoKafScale:mainfrom
kamir:feat/lfs-core-clean

Conversation

@kamir
Copy link
Collaborator

@kamir kamir commented Mar 6, 2026

Summary

Transparent large-file offloading for Kafka via an S3-backed proxy. When enabled, oversized records are automatically replaced with compact envelope references pointing to S3 objects, keeping Kafka partitions lean while supporting arbitrarily large payloads.

What changed (reviewer feedback addressed)

After initial submission, two concerns were raised by @klaudworks:

  1. cmd/lfs-proxy/ duplicated proxy infrastructure — Removed entirely (5,733 lines). LFS is now exclusively a feature-flag on the unified proxy (KAFSCALE_PROXY_LFS_ENABLED=true), which provides partition-aware routing and consumer group support.

  2. pkg/lfs had redundant envelope resolution methods — Consolidated Unwrap() and UnwrapEnvelope() into a single Unwrap() -> (*Envelope, []byte, error) method. Callers that don't need envelope metadata simply ignore the first return value.

Additionally:

  • Fixed CodeQL notices in Python SDK (unused import, empty except clause)
  • Added 123 new LFS unit tests for comprehensive coverage of the unified proxy

Commit history

Commit Description
feb7e4f8 feat: add Large File Support (LFS) subsystem
ebdbdb84 chore: strip demo/deployment targets from Makefile
fc96d604 refactor: consolidate Consumer.Unwrap() and UnwrapEnvelope() into single method
4cc089fe refactor: remove deprecated standalone lfs-proxy binary
c688877f fix: resolve CodeQL notices in Python LFS SDK
623f09f5 test: add comprehensive LFS unit tests for unified proxy

Core Go packages

  • pkg/lfs — envelope codec, S3 client, checksum, consumer/producer/resolver
  • cmd/proxy/lfs_*.go — feature-flagged LFS module in the unified proxy
  • internal/console — LFS consumer, S3 browser, and metrics in web console
  • pkg/operator — LFS proxy Kubernetes resource management

Deployment

  • deploy/helm — Helm values for LFS config (proxy.lfs.enabled)
  • deploy/docker-compose — local development compose (unified proxy with LFS)

Client SDKs

  • Javalfs-client-sdk/java/ (Maven, envelope codec, producer, consumer, resolver)
  • Pythonlfs-client-sdk/python/ (envelope, producer, resolver)
  • JavaScriptlfs-client-sdk/js/ (Node.js SDK with tests)
  • Browserlfs-client-sdk/js-browser/ (TypeScript browser SDK)

Addons

  • Iceberg processoraddons/processors/iceberg-processor/ reads LFS envelopes and sinks to Apache Iceberg tables

Bug fix included

  • Broker sends error response instead of dropping TCP connection on handler errors (fixes "fetching message: EOF" with older Fetch versions)

Test coverage

171 tests in cmd/proxy/ — all passing:

  • lfs_http_test.go (70 tests) — HTTP handlers (/lfs/produce, /lfs/download, /lfs/uploads), request validation, CORS, multipart upload sessions, error responses
  • lfs_test.go (53 new tests) — record batch encoding, varint codec, CRC32 checksums, compression (none/gzip/snappy), Kafka header manipulation, S3 put/get/delete, Prometheus metrics, histogram buckets
  • Pre-existing proxy tests (48 tests) — metadata rewriting, connection pool, group extraction, fetch routing

pkg/lfs/ tests — envelope encode/decode, consumer unwrap (consolidated API), resolver with checksum validation, S3 client streaming

lfs-client-sdk/ tests — Java (JUnit), JavaScript (Jest), Python (pytest)

File count

~170 files changed (down from 212 after removing the standalone lfs-proxy).

Test plan

  • go vet ./... — clean
  • go test ./... — all packages pass
  • License header check — all files pass
  • CodeQL — clean (Python notices resolved)
  • Full LFS unit test coverage for unified proxy (123 new tests)
  • E2E test suite for LFS SDK and iceberg processor included

Note: LFS documentation (docs/lfs/) will follow in a separate PR targeting gh-pages. SDK examples (E60-E72) will be created in the onboarding-tutorials repo after this PR merges.

🤖 Generated with Claude Code

Add transparent large-file offloading for Kafka via an S3-backed proxy.
When enabled, oversized records are automatically replaced with compact
envelope references pointing to S3 objects, keeping Kafka partitions
lean while supporting arbitrarily large payloads.

Core components:
- pkg/lfs: envelope codec, S3 client, checksum, consumer/producer/resolver
- cmd/lfs-proxy: standalone LFS proxy binary with HTTP ingest, SASL,
  TLS, Swagger UI, and ops tracker
- cmd/proxy/lfs_*: feature-flagged LFS module merged into unified proxy
- internal/console: LFS consumer, S3 browser, and metrics in web console
- pkg/operator: LFS proxy resource management for Kubernetes operator
- deploy/helm: Helm templates for lfs-proxy deployment and monitoring

Client SDKs (Java, Python, JavaScript, browser):
- lfs-client-sdk/: multi-language SDKs for producing/consuming LFS records

Iceberg processor addon:
- addons/processors/iceberg-processor: reads LFS envelopes and sinks to
  Apache Iceberg tables

Also includes:
- Broker fix: send error response instead of dropping TCP connection on
  handler errors (fixes "fetching message: EOF" with older Fetch versions)
- E2E test suite for LFS proxy, SDK, and iceberg processor
- CI/Helm/Docker updates for LFS proxy build and deployment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kamir added a commit to kamir/kafscale that referenced this pull request Mar 6, 2026
HTML report showing the 212 files included in the clean LFS PR
and the 5,912 files excluded (node_modules, build artifacts,
personal configs, docs deferred to separate PRs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
novatechflow
novatechflow previously approved these changes Mar 6, 2026
Copy link
Collaborator

@novatechflow novatechflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank @kamir - awesome! Large file support is a true issue on plain Kafka and provider-based ones!

@novatechflow novatechflow requested a review from klaudworks March 6, 2026 08:19
…gitignore

Enforce strict separation: core platform PR must not contain demo scripts,
deployment guides, staging infra, or spring-boot demo targets. Only core
LFS additions remain (build-sdk, docker-build-lfs-proxy, test-lfs-proxy-broker).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@klaudworks
Copy link
Collaborator

I just noticed that this PR still duplicates the whole kafka proxy logic in cmd/lfs-proxy/*.go

@klaudworks
Copy link
Collaborator

klaudworks commented Mar 6, 2026

pkg/lfs looks like it is quite bloated. There are four different ways to resolve an LFS envelope to its payload: Consumer.Unwrap(), Consumer.UnwrapEnvelope(), Resolver.Resolve(), and Record.Value(). Reducing that and some other things to a streamlined interface would also make it easier to use in e.g. the iceberg processor.

This and the duplication mentioned above seem like quick wins before we merge it.

novatechflow
novatechflow previously approved these changes Mar 6, 2026
Copy link
Collaborator

@novatechflow novatechflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats great! thanks @kamir

@novatechflow
Copy link
Collaborator

pkg/lfs looks like it is quite bloated. There are four different ways to resolve an LFS envelope to its payload: Consumer.Unwrap(), Consumer.UnwrapEnvelope(), Resolver.Resolve(), and Record.Value(). Reducing that and some other things to a streamlined interface would also make it easier to use in e.g. the iceberg processor.

This and the duplication mentioned above seem like quick wins before we merge it.

good points. @kamir, thoughts?

@kamir
Copy link
Collaborator Author

kamir commented Mar 9, 2026

I started working on the comments. Here is the plan:

  1. Commit 1 (A1): Consolidate Consumer.Unwrap() + UnwrapEnvelope() into a single Unwrap() returning (*Envelope, []byte, error). Update all callers.
  2. Commit 2 (A2): Remove cmd/lfs-proxy/ entirely (it's deprecated, unified proxy covers everything).

kamir and others added 2 commits March 9, 2026 17:39
… into single method

Addresses reviewer feedback (klaudworks): pkg/lfs had four different ways
to resolve an LFS envelope. This removes the redundancy between Unwrap()
and UnwrapEnvelope() by consolidating into a single Unwrap() that returns
(*Envelope, []byte, error). Callers that don't need the envelope metadata
simply ignore the first return value.

Before: Unwrap() -> ([]byte, error)           — discards envelope
        UnwrapEnvelope() -> (*Envelope, []byte, error) — preserves envelope

After:  Unwrap() -> (*Envelope, []byte, error) — always available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses reviewer feedback (klaudworks): cmd/lfs-proxy/ duplicated ~30%
of the unified proxy's infrastructure (TCP listener, Kafka protocol
handling, connection management, health checks).

LFS is now exclusively a feature-flag on the unified proxy, enabled with
KAFSCALE_PROXY_LFS_ENABLED=true. The unified proxy provides partition-aware
routing (vs round-robin) and consumer group support.

Removed:
- cmd/lfs-proxy/ (5,733 lines of Go source + tests)
- deploy/docker/lfs-proxy.Dockerfile
- deploy/helm/kafscale/templates/lfs-proxy-*.yaml (6 Helm templates)
- test/e2e/lfs_proxy_*.go (e2e tests that exec'd the standalone binary)
- CI jobs: build-lfs-proxy, e2e-lfs-proxy
- Makefile targets: docker-build-lfs-proxy, test-lfs-proxy-broker

Kept:
- cmd/proxy/lfs_*.go (unified proxy LFS module)
- pkg/lfs/ (shared LFS library)
- api/lfs-proxy/openapi.yaml (API spec)
- lfs-client-sdk/ (all client SDKs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
kamir and others added 3 commits March 9, 2026 17:56
- Remove unused 'Any' import in envelope.py
- Add explanatory comment to empty except clause in producer.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 123 new tests covering all LFS code paths in the unified proxy:
- lfs_http_test.go (70 tests): HTTP handlers, validation, CORS, multipart sessions
- lfs_test.go (53 tests): record encoding, compression, headers, metrics, histogram

All 171 cmd/proxy tests pass (including pre-existing proxy tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@novatechflow
Copy link
Collaborator

@klaudworks - pls review too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants