PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

👥 Authors

Jianyu Lai^1,2*, Sixiang Chen^1,2*, Jialin Gao²*, Hengyu Shi², Zhongying Liu², Fuxiang Zhai¹, Junfeng Luo², Xiaoming Wei², Lujia Wang¹, Lei Zhu^1,3†

¹HKUST (GZ) ²Meituan ³HKUST

*Equal contribution, †Corresponding Author

💡 We also have other poster generation/editing works that may interest you✨

[CVPR 2026] PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
Sixiang Chen, Jianyu Lai, Jialin Gao, et al.

[ICLR 2026] PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Sixiang Chen, Jianyu Lai, Jialin Gao, et al.

🔥 News & Updates

📄 [2026.03] Official arXiv version released: 2603.29855
🚀 [2026.03] Open-sourced the inference scripts and benchmark pipeline in this repository.
📄 [2026.03] Released the latest public paper draft: PosterReward_Arxiv_released.pdf
🏆 [2026.02] PosterReward was accepted to CVPR 2026.

🌟 Overview

_{PosterReward is a reward model for poster generation tasks. It evaluates posters from multiple dimensions and outputs scores, achieving an accurate assessment of graphic design quality.}

Recent progress in text-rendering image generation makes end-to-end poster creation increasingly feasible, but general-purpose reward models still struggle to assess typography, layout, and design-specific quality. To bridge this gap, PosterReward builds a 70k poster preference dataset from multi-MLLM consensus and introduces a dedicated reward modeling framework for poster assessment.

The paper formalizes poster evaluation with five dimensions:

Foundational Visual Quality
AI Artifacts
Textual Accuracy
Prompt Fidelity
Aesthetic Value

PosterReward provides three model variants:

PosterReward: a two-stage discriminative reward model with analysis -> scoring.
PosterReward-Lite: a simplified pointwise variant that omits the analysis module for faster inference.
PosterReward-Pairwise: a generative pairwise judge that predicts preference and can output CoT reasoning.

_{PosterReward training pipeline and model structure diagram. The top shows three reward models with different structures, and the bottom shows the training pipeline. Our training pipeline consists of four cascaded stages: Joint Supervised Fine-Tuning, Joint Rejection Sampling, Score-Module Training, and Reinforcement Learning.}

It also introduces two benchmarks:

PosterRewardBench: benchmark for reward-model accuracy on poster preference assessment.
PosterBench: benchmark for text-to-image poster generation quality.

🚀 Quick Start

🔧 Environment Setup

conda create -n posterreward python=3.10 -y
conda activate posterreward

cd swift
pip install -e .

cd ..
pip install msgspec "qwen_vl_utils>=0.0.14" torchvision diffusers pillow

# For vLLM-based deployment (required for the Analyser in the full PosterReward pipeline and PosterBench):
pip install "torch>=2.8.0" "vllm>=0.11.0"

Note: If you encounter a vLLM engine initialization error, ensure that your torch and vllm versions are compatible. We have verified that torch==2.8.0 + vllm==0.11.0 work well with the ms-swift version included in this repo.

⚡ PosterReward-Lite

PosterReward-Lite is the fastest entry point for single-image pointwise scoring.

Edit MODEL_PATH, IMAGE_PATH, and PROMPT in inference_lite.sh.
Run:

bash inference_lite.sh

⚡ Full PosterReward

The full PosterReward pipeline is a two-stage process:

posterreward_analyser.py generates a detailed multi-dimensional analysis.
posterreward_scorer.py turns that analysis into the final scalar reward.

To use it:

Edit ANALYSER_MODEL, SCORER_MODEL, PROMPT, and IMAGE_PATH in inference_posterreward.sh.
Run:

bash inference_posterreward.sh

Outputs are saved to ./posterreward_output/ by default.

🧠 Model Notes

The released PosterReward family is built on Qwen3-VL-8B:

The analysis module is fine-tuned from Qwen3-VL-8B.
The scoring module uses a scalar reward head on top of Qwen3-VL-8B.
PosterReward-Pairwise is a generative pairwise model fine-tuned from Qwen3-VL-8B.

The training pipeline has four cascaded stages:

Joint Supervised Fine-Tuning
Joint Rejection Sampling Fine-Tuning
Scoring Module Training
Reinforcement Learning Fine-Tuning

🧪 PosterBench

poster_bench/ contains the evaluation pipeline for poster generation benchmarking.

PosterBench contains 250 prompts: 100 cinematic prompts and 150 non-cinematic prompts. Each model generates 8 samples per prompt for stability evaluation.

Pipeline Overview

Step	Script	Purpose
1	`step1_generate_images.py`	Generate poster images from prompts
2	`step2_vllm_analyze.py`	Produce detailed image analysis with a deployed VLM
3	`step3_reward_score.py` / `step3_reward_score.sh`	Score the analyzed samples with PosterReward
4	`step4_metrics_analysis.py` / `step4_metrics_analysis.sh`	Aggregate final benchmark metrics

Step 1: Generate Images

The default step1_generate_images.py script is configured for a Qwen-Image-2512-style backend and writes outputs to results_qwen_image_2512/.

cd poster_bench

# Edit MODEL_PATH in the script first
python step1_generate_images.py

This repository also includes dedicated generation scripts for:

step1_generate_images.py for Qwen-Image-2512-style generation
step1_generate_images_flux_klein.py
step1_generate_images_zimage.py

PosterBench additionally reports closed-source systems such as Nano-Banana-Pro, Seedream-4.5, Nano-Banana, Seedream-4.0, GPT-Image-1, and Seedream-3.0, and open-source systems such as Qwen-Image-2512, Qwen-Image, Z-Image-Turbo, Flux.2-klein-9B, Flux.1-krea-dev, Flux.1-dev, and SD3.5-L.

Step 2: Batch Analysis

bash step2_vllm_deploy.sh

python step2_vllm_analyze.py \
  --model_folder ./results_qwen_image_2512 \
  --output all_models_analysis.jsonl

We define the analysis target with the following five dimensions. The current implementation prompt uses equivalent, slightly more descriptive wording for the same evaluation axes:

Foundational Visual Quality
AI Artifacts
Textual Accuracy
Prompt Fidelity
Aesthetic Value

Step 3: Reward Scoring

bash step3_reward_score.sh

The provided shell script is a template and contains path variables that should be edited for your environment.

Step 4: Metrics Analysis

bash step4_metrics_analysis.sh

PosterBench reports four core metrics:

Mean: average PosterReward score over all generated posters.
Median: median PosterReward score over all generated posters.
Std-Avg: mean of the per-prompt standard deviations across the 8 generations; lower is better.
Bo8-Avg: Best-of-8 Average, defined as the mean of the highest-scoring sample within each set of 8 generations.

📚 PosterRewardBench

poster_reward_bench/ contains benchmark resources for evaluating reward models on poster assessment tasks.

Data Preparation

The benchmark images are hosted on Hugging Face. Download and extract them before running the evaluation:

cd poster_reward_bench

# Download from Hugging Face (requires huggingface-cli)
huggingface-cli download MeiGen-AI/PosterReward_v1 PRB_basic_images.tar.gz --repo-type model --local-dir .
huggingface-cli download MeiGen-AI/PosterReward_v1 PRB_advanced_images.tar.gz --repo-type model --local-dir .

# Extract
tar -xzf PRB_basic_images.tar.gz
tar -xzf PRB_advanced_images.tar.gz

After extraction, the directory structure should look like:

poster_reward_bench/
├── PRB_basic_relative.json
├── PRB_advanced_relative.json
├── PRB_basic_images/          # 1,034 images
├── PRB_advanced_images/       # 2,446 images
└── ...

PosterRewardBench has two subsets:

PosterRewardBench-Basic: generated by Flux, Flux-Krea, and SD3.5-L, with larger quality variation.
PosterRewardBench-Advanced: generated by Seedream-3.0, Seedream-4.0, and Qwen-Image-Lightning, with higher overall quality and smaller quality gaps.

All preference pairs were reviewed by four professional annotators, and only pairs with agreement from at least three annotators were retained.

For pointwise models, we report accuracy (↑) on MMRB2, HPDv3, PRB-Basic, and PRB-Ad, where:

PRB-Basic = PosterRewardBench-Basic
PRB-Ad = PosterRewardBench-Advanced

Evaluation Pipeline

Evaluating the full PosterReward model on PosterRewardBench is a two-step process:

Step 1: Deploy the Analyser and generate analyses

cd poster_reward_bench

# Deploy the PosterReward Analyser via vLLM (edit MODEL_PATH in vllm_deploy.sh first)
bash vllm_deploy.sh

# In another terminal, run the analysis generation script
python step1_gen_analysis.py

This generates PRB_basic_relative_with_analysis.json and PRB_advanced_relative_with_analysis.json. The script supports checkpointing and can resume from interruptions.

Step 2: Score and evaluate accuracy

# Edit MODEL_PATH in batch_eval.sh, then run:
bash batch_eval.sh

Note: Ensure the correct swift binary is in your PATH (from the project's conda environment), as step2_eval.py invokes swift infer via subprocess.

📊 Main Results

🎯 Pointwise Reward Models

Legend: 🥇 best, 🥈 second best within each metric column.

Model	MMRB2 ↑	HPDv3 ↑	PRB-Basic ↑	PRB-Ad ↑
ImageReward	53.0	58.6	60.7	49.3
PickScore	57.6	65.6	66.7	44.1
HPSv2	55.0	65.3	70.8	43.7
UnifiedReward*	56.9	59.4	60.0	52.7
HPSv3	58.5	76.9	72.9	41.2
PosterReward-Lite	60.5 🥇	77.1 🥈	83.9 🥈	85.0 🥈
PosterReward	59.6 🥈	77.8 🥇	86.7 🥇	86.0 🥇

🖼️ PosterBench Results

Legend: 🥇 best, 🥈 second best. For Std-Avg, lower is better.

Closed-Source Models

Model	Mean ↑	Median ↑	Std-Avg ↓	Bo8-Avg ↑
Nano-Banana-Pro	13.36 🥇	13.47 🥇	1.91 🥈	15.77 🥇
Seedream-4.5*	12.03 🥈	12.09 🥈	2.08	14.57 🥈
Nano-Banana	11.60	11.69	2.17	14.49
Seedream-4.0	11.46	11.44	2.06	13.93
GPT-Image-1	11.16	11.38	1.75 🥇	13.43
Seedream-3.0	5.01	5.13	3.66	9.75

Open-Source Models

Model	Mean ↑	Median ↑	Std-Avg ↓	Bo8-Avg ↑
Qwen-Image-2512	11.86 🥇	11.63 🥇	1.46 🥇	13.85 🥇
Qwen-Image	7.69 🥈	7.72 🥈	2.55	11.06 🥈
Z-Image-Turbo	7.65	7.31	2.18 🥈	10.47
Flux.2-klein-9B	7.38	7.66	3.20	11.67
Flux.1-krea-dev	5.00	5.14	3.59	9.58
Flux.1-dev	2.55	2.42	3.85	7.81
SD3.5-L	-2.90	-3.92	2.68	1.24

💾 Model Zoo

Model	Type	Link
PosterReward_Analyser	Generative VLM for multi-dimensional analysis	🤗 Hugging Face
PosterReward-Pairwise	Generative pairwise judge	🤗 Hugging Face
PosterReward_Scorer	Scalar reward model for scoring	🤗 Hugging Face
PosterReward-Lite	Fast pointwise scorer	🤗 Hugging Face

🙏 Acknowledgments

Thanks to our collaborators and affiliated institutions.
Thanks to the open-source community and prior reward modeling research.

📝 Citation

Comming Soon!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

👥 Authors

🔥 News & Updates

🌟 Overview

🚀 Quick Start

🔧 Environment Setup

⚡ PosterReward-Lite

⚡ Full PosterReward

🧠 Model Notes

🧪 PosterBench

Pipeline Overview

Step 1: Generate Images

Step 2: Batch Analysis

Step 3: Reward Scoring

Step 4: Metrics Analysis

📚 PosterRewardBench

Data Preparation

Evaluation Pipeline

📊 Main Results

🎯 Pointwise Reward Models

🖼️ PosterBench Results

Closed-Source Models

Open-Source Models

💾 Model Zoo

🙏 Acknowledgments

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assert		assert
poster_bench		poster_bench
poster_reward_bench		poster_reward_bench
swift		swift
README.md		README.md
inference_lite.sh		inference_lite.sh
inference_posterreward.sh		inference_posterreward.sh
posterreward_analyser.py		posterreward_analyser.py
posterreward_lite_inference.py		posterreward_lite_inference.py
posterreward_scorer.py		posterreward_scorer.py

Folders and files

Latest commit

History

Repository files navigation

PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

👥 Authors

🔥 News & Updates

🌟 Overview

🚀 Quick Start

🔧 Environment Setup

⚡ PosterReward-Lite

⚡ Full PosterReward

🧠 Model Notes

🧪 PosterBench

Pipeline Overview

Step 1: Generate Images

Step 2: Batch Analysis

Step 3: Reward Scoring

Step 4: Metrics Analysis

📚 PosterRewardBench

Data Preparation

Evaluation Pipeline

📊 Main Results

🎯 Pointwise Reward Models

🖼️ PosterBench Results

Closed-Source Models

Open-Source Models

💾 Model Zoo

🙏 Acknowledgments

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages