Blog: Getting structured data out of images with Granite Vision 4.1 by planetf1 · Pull Request #48 · generative-computing/mellea-website

planetf1 · 2026-05-20T13:28:36Z

What this is showing off

Vision models return prose. The point of this post is that they don't have to.

The blog demonstrates Mellea's extraction pattern — pass a format= Pydantic model to
m.instruct(), get a typed Python object back instead of a string. No JSON prompt
engineering, no json.JSONDecodeError handlers, no post-processing regex. The return type
is the contract, and constrained decoding enforces it.

It then builds up two layers of validation on top:

requirements= — plain-English semantic constraints (date format, positive totals).
The model retries with the failed requirement injected into the repair prompt.
IVR validation_fn — programmatic arithmetic check (line items × quantities = subtotal).
The failure reason gets fed back into the repair prompt verbatim.

The receipt image is synthetic (PIL-generated) with a thermal-printer smudge over part of
the subtotal, giving the validation layers something realistic to catch.

Strategy fix (latest commit): Both validation sections now use RepairTemplateStrategy
instead of RejectionSamplingStrategy. RejectionSamplingStrategy.repair() returns the
unchanged action/context — same prompt, no feedback. RepairTemplateStrategy builds a
repair prompt with the failed requirement or ValidationResult.reason injected, which is
what the surrounding blog prose describes.

Status: Draft — scenario still being refined

The Mellea API usage and code structure are stable. The receipt scenario is still being
iterated. Detection is reliable: the date format requirement consistently detects 22/03/2026
and the arithmetic check confirms extractions are correct. Repair of the date format issue
at 4b scale is not guaranteed (the blog's conclusion now says this explicitly). Receipt values
may change before publication.

Model availability — why this is blocked

The blog is written for Ollama, which is the right default for a local-first post. The
problem: Ollama requires GGUF format, and Granite Vision 4.1 is only available as full
bfloat16 safetensors on Hugging Face right now (~8 GB download, not a 4-bit quantized GGUF
like you'd get from ollama pull).

Ollama cannot load safetensors directly — it needs IBM to publish a GGUF to the Ollama
library (or a community conversion to appear). Until then, the testing path is mlx-vlm on
Apple Silicon or vLLM, both of which can serve safetensors directly.

Watch https://ollama.com/library for granite-vision-4.1. When it lands: remove the
editorial note, verify ollama pull granite-vision-4.1 works, flip to ready.

Reviewing now

Follow the editorial note at the top of the post. Short version:

Set up a clean environment and start the model server:

mkdir granite-vision-test && cd granite-vision-test
uv init --bare --python 3.12
uv add mlx-vlm mellea pillow
uv run python -m mlx_vlm.server --model ibm-granite/granite-vision-4.1-4b

Model downloads ~8 GB on first run (full bfloat16 safetensors — larger than an Ollama pull).
Serves at http://localhost:8080/v1.

In each code snippet, swap the session setup from:

m = start_session(model_id="granite-vision-4.1")

to:

m = MelleaSession(OpenAIBackend("ibm-granite/granite-vision-4.1-4b",
                                base_url="http://localhost:8080/v1", api_key="mlx"))

Test plan

npm run dev — confirm post renders at /blogs/granite-vision-structured-extraction
Receipt image displays correctly (smudge on subtotal visible)
Code syntax highlighting looks right on all blocks
Run the code against mlx-vlm per the instructions above

🤖 Generated with Claude Code

Blog post covering m.instruct() + format= + ImageBlock for typed receipt extraction, building up through requirements= and IVR validation_fn. Includes a synthetic receipt image generated with PIL. Assisted-by: Claude Code

Assists-by: Claude Code

- Add `text` language tag to output fence (fixes MD040 lint failure) - Wrap check_line_totals with simple_validate() — validation_fn expects Callable[[Context], ValidationResult], not str directly - pip install → uv add (consistent with other Mellea blogs) - Add conclusion section with recap and cross-references to docs.mellea.ai Assisted-by: Claude Code

…on blog - Replace line-item arithmetic check with subtotal+tax=total verification; the old check failed because granite3.2-vision reads discounts as positive - Rewrite 'What we covered' as narrative 'From narration to data' section Assisted-by: Claude Code

- New receipt image: 6 line items with smudged subtotal digit - Expanded editorial note: marks as draft, notes scenario still being iterated, clarifies Ollama not yet available but expected soon - Sync blog body to new receipt values ($79.86 total, no discounts) - IVR section references smudged subtotal as the failure trigger Assisted-by: Claude Code

psschwei · 2026-05-20T14:49:01Z

Don't have a strong opinion here, but assuming it takes a while to get the vision model into Ollama should we consider using vllm in the blog instead?

Switch from RejectionSamplingStrategy to RepairTemplateStrategy in both the requirements= and IVR sections. RejectionSamplingStrategy just retries with the same prompt; RepairTemplateStrategy injects the validation failure reason into the repair prompt — which is what the surrounding prose already describes. Also promote "Going further" from bold text to a ## heading, and add a paragraph to the conclusion making detection vs. repair guarantees explicit. Assisted-by: Claude Code

ajbozarth

I'll walk through the blog and try it out myself when I have bandwidth, but to start heres a small review from Claude:

Code checks out against current mellea source — APIs, imports, signatures, and the RepairTemplateStrategy switch all verify. Front matter and asset are good. Snippet syntax checks pass; live execution skipped (model not in Ollama yet, per the editorial note). de-llmify score 1.

A few inline notes below. Pre-publish blockers (editorial note removal, Ollama availability) are already tracked in the PR description.

planetf1 · 2026-05-21T07:40:56Z

Don't have a strong opinion here, but assuming it takes a while to get the vision model into Ollama should we consider using vllm in the blog instead?

vllm tends to target more the operational audience rather than developer I think but also doesn't have good macOS support. Whilst there's a fork supporting mlx, it doesn't support vision models. I'd actually suggest sticking with the mlx-vlm workaround if the gguf takes a while for mac. But we could add vllm as the linux workaround. Ollama remains the least friction, so will look into likely timescales.

- Add representative-output note after prose output block (line 89) - Apply suggested rewrite: "The point of wiring the check programmatically is that a silent wrong answer is no longer possible. Repair success is a separate question." - Tighten conclusion: drop recap paragraphs covering format=, requirements=, and backend portability (all covered inline); lead with the detection-vs-repair framing which is the only new content Assisted-by: Claude Code

planetf1 · 2026-05-21T07:46:07Z

Thanks @psschwei @ajbozarth for comments. There are still two areas I'm working on

the likely timescale for ollama support (sometimes new model releases require changes). If long, fallback could be to add vllm for linux users
refining the actual scenario - I'm not happy with the run-through currently. Needs a bit more work on the scenario, prompts & mellea constructs. (hence draft - but wanted to share the overall idea and get general review)

psschwei · 2026-05-21T11:40:00Z

vllm tends to target more the operational audience rather than developer I think but also doesn't have good macOS support. Whilst there's a fork supporting mlx, it doesn't support vision models. I'd actually suggest sticking with the mlx-vlm workaround if the gguf takes a while for mac. But we could add vllm as the linux workaround.

Sorry, I misread your original post, thought I read mlx-vllm. Mostly was just trying to see if ollama is a hard requirement or if serving the model another way would be fine.

Ollama remains the least friction, so will look into likely timescales.

My understanding is that it is being actively worked on, but may take a few weeks (apparently getting vision models working with ollama is not very straightforward).

Though, and not saying we should just posing as an idea, what if we target colab for running the code? The vision model is 4B which might fit on a free instance, and if not would only take a few credits (<$1) to run for an hour. It would also reduce the setup headaches even more (simply click a cell or two and everything is up and running).

ajbozarth · 2026-05-21T19:39:47Z

The updates addressing my previous review look good, but I'm going to hold off on further review until you've finalized this and moved it out of draft

psschwei · 2026-05-21T21:45:20Z

closing and reopening to get the DCO bot to fire
edit: didn't work 😕

planetf1 requested review from a team and ajbozarth as code owners May 20, 2026 13:28

planetf1 requested a review from psschwei May 20, 2026 13:28

chore: rename receipt image to match blog post slug

89ed7af

Assists-by: Claude Code

planetf1 marked this pull request as draft May 20, 2026 13:31

planetf1 added 3 commits May 20, 2026 14:45

ajbozarth reviewed May 20, 2026

View reviewed changes

Comment thread content/blogs/granite-vision-structured-extraction.md

Comment thread content/blogs/granite-vision-structured-extraction.md Outdated

Comment thread content/blogs/granite-vision-structured-extraction.md Outdated

psschwei closed this May 21, 2026

psschwei reopened this May 21, 2026

psschwei closed this May 21, 2026

psschwei reopened this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog: Getting structured data out of images with Granite Vision 4.1#48

Blog: Getting structured data out of images with Granite Vision 4.1#48
planetf1 wants to merge 7 commits into
generative-computing:mainfrom
planetf1:blog/granite-vision-structured-extraction

planetf1 commented May 20, 2026 •

edited

Loading

Uh oh!

psschwei commented May 20, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented May 21, 2026

Uh oh!

planetf1 commented May 21, 2026 •

edited

Loading

Uh oh!

psschwei commented May 21, 2026

Uh oh!

ajbozarth commented May 21, 2026

Uh oh!

psschwei commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

planetf1 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this is showing off

Status: Draft — scenario still being refined

Model availability — why this is blocked

Reviewing now

Test plan

Uh oh!

psschwei commented May 20, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented May 21, 2026

Uh oh!

planetf1 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psschwei commented May 21, 2026

Uh oh!

ajbozarth commented May 21, 2026

Uh oh!

psschwei commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented May 20, 2026 •

edited

Loading

planetf1 commented May 21, 2026 •

edited

Loading

psschwei commented May 21, 2026 •

edited

Loading