Skip to content

MeiGen-AI/PosterReward

Repository files navigation

PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

PosterReward Overview

arXiv Paper Project Code Model

πŸ‘₯ Authors

Jianyu Lai1,2*, Sixiang Chen1,2*, Jialin Gao2*, Hengyu Shi2, Zhongying Liu2, Fuxiang Zhai1, Junfeng Luo2, Xiaoming Wei2, Lujia Wang1, Lei Zhu1,3†

1HKUST (GZ) Β  2Meituan Β  3HKUST

*Equal contribution, †Corresponding Author


πŸ’‘ We also have other poster generation/editing works that may interest you✨

[CVPR 2026] PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
Sixiang Chen, Jianyu Lai, Jialin Gao, et al.
github arXiv Project Page

[ICLR 2026] PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Sixiang Chen, Jianyu Lai, Jialin Gao, et al.
github arXiv Project Page


πŸ”₯ News & Updates

  • πŸ“„ [2026.03] Official arXiv version released: 2603.29855
  • πŸš€ [2026.03] Open-sourced the inference scripts and benchmark pipeline in this repository.
  • πŸ“„ [2026.03] Released the latest public paper draft: PosterReward_Arxiv_released.pdf
  • πŸ† [2026.02] PosterReward was accepted to CVPR 2026.

🌟 Overview

PosterReward Teaser
PosterReward is a reward model for poster generation tasks. It evaluates posters from multiple dimensions and outputs scores, achieving an accurate assessment of graphic design quality.

Recent progress in text-rendering image generation makes end-to-end poster creation increasingly feasible, but general-purpose reward models still struggle to assess typography, layout, and design-specific quality. To bridge this gap, PosterReward builds a 70k poster preference dataset from multi-MLLM consensus and introduces a dedicated reward modeling framework for poster assessment.

The paper formalizes poster evaluation with five dimensions:

  • Foundational Visual Quality
  • AI Artifacts
  • Textual Accuracy
  • Prompt Fidelity
  • Aesthetic Value

PosterReward provides three model variants:

  • PosterReward: a two-stage discriminative reward model with analysis -> scoring.
  • PosterReward-Lite: a simplified pointwise variant that omits the analysis module for faster inference.
  • PosterReward-Pairwise: a generative pairwise judge that predicts preference and can output CoT reasoning.
PosterReward Framework
PosterReward training pipeline and model structure diagram. The top shows three reward models with different structures, and the bottom shows the training pipeline. Our training pipeline consists of four cascaded stages: Joint Supervised Fine-Tuning, Joint Rejection Sampling, Score-Module Training, and Reinforcement Learning.

It also introduces two benchmarks:

  • PosterRewardBench: benchmark for reward-model accuracy on poster preference assessment.
  • PosterBench: benchmark for text-to-image poster generation quality.

πŸš€ Quick Start

πŸ”§ Environment Setup

conda create -n posterreward python=3.10 -y
conda activate posterreward

cd swift
pip install -e .

cd ..
pip install msgspec "qwen_vl_utils>=0.0.14" torchvision diffusers pillow

# For vLLM-based deployment (required for the Analyser in the full PosterReward pipeline and PosterBench):
pip install "torch>=2.8.0" "vllm>=0.11.0"

Note: If you encounter a vLLM engine initialization error, ensure that your torch and vllm versions are compatible. We have verified that torch==2.8.0 + vllm==0.11.0 work well with the ms-swift version included in this repo.

⚑ PosterReward-Lite

PosterReward-Lite is the fastest entry point for single-image pointwise scoring.

  1. Edit MODEL_PATH, IMAGE_PATH, and PROMPT in inference_lite.sh.
  2. Run:
bash inference_lite.sh

⚑ Full PosterReward

The full PosterReward pipeline is a two-stage process:

  1. posterreward_analyser.py generates a detailed multi-dimensional analysis.
  2. posterreward_scorer.py turns that analysis into the final scalar reward.

To use it:

  1. Edit ANALYSER_MODEL, SCORER_MODEL, PROMPT, and IMAGE_PATH in inference_posterreward.sh.
  2. Run:
bash inference_posterreward.sh

Outputs are saved to ./posterreward_output/ by default.

🧠 Model Notes

The released PosterReward family is built on Qwen3-VL-8B:

  • The analysis module is fine-tuned from Qwen3-VL-8B.
  • The scoring module uses a scalar reward head on top of Qwen3-VL-8B.
  • PosterReward-Pairwise is a generative pairwise model fine-tuned from Qwen3-VL-8B.

The training pipeline has four cascaded stages:

  1. Joint Supervised Fine-Tuning
  2. Joint Rejection Sampling Fine-Tuning
  3. Scoring Module Training
  4. Reinforcement Learning Fine-Tuning

πŸ§ͺ PosterBench

poster_bench/ contains the evaluation pipeline for poster generation benchmarking.

PosterBench contains 250 prompts: 100 cinematic prompts and 150 non-cinematic prompts. Each model generates 8 samples per prompt for stability evaluation.

Pipeline Overview

Step Script Purpose
1 step1_generate_images.py Generate poster images from prompts
2 step2_vllm_analyze.py Produce detailed image analysis with a deployed VLM
3 step3_reward_score.py / step3_reward_score.sh Score the analyzed samples with PosterReward
4 step4_metrics_analysis.py / step4_metrics_analysis.sh Aggregate final benchmark metrics

Step 1: Generate Images

The default step1_generate_images.py script is configured for a Qwen-Image-2512-style backend and writes outputs to results_qwen_image_2512/.

cd poster_bench

# Edit MODEL_PATH in the script first
python step1_generate_images.py

This repository also includes dedicated generation scripts for:

  • step1_generate_images.py for Qwen-Image-2512-style generation
  • step1_generate_images_flux_klein.py
  • step1_generate_images_zimage.py

PosterBench additionally reports closed-source systems such as Nano-Banana-Pro, Seedream-4.5, Nano-Banana, Seedream-4.0, GPT-Image-1, and Seedream-3.0, and open-source systems such as Qwen-Image-2512, Qwen-Image, Z-Image-Turbo, Flux.2-klein-9B, Flux.1-krea-dev, Flux.1-dev, and SD3.5-L.

Step 2: Batch Analysis

bash step2_vllm_deploy.sh

python step2_vllm_analyze.py \
  --model_folder ./results_qwen_image_2512 \
  --output all_models_analysis.jsonl

We define the analysis target with the following five dimensions. The current implementation prompt uses equivalent, slightly more descriptive wording for the same evaluation axes:

  1. Foundational Visual Quality
  2. AI Artifacts
  3. Textual Accuracy
  4. Prompt Fidelity
  5. Aesthetic Value

Step 3: Reward Scoring

bash step3_reward_score.sh

The provided shell script is a template and contains path variables that should be edited for your environment.

Step 4: Metrics Analysis

bash step4_metrics_analysis.sh

PosterBench reports four core metrics:

  • Mean: average PosterReward score over all generated posters.
  • Median: median PosterReward score over all generated posters.
  • Std-Avg: mean of the per-prompt standard deviations across the 8 generations; lower is better.
  • Bo8-Avg: Best-of-8 Average, defined as the mean of the highest-scoring sample within each set of 8 generations.

πŸ“š PosterRewardBench

poster_reward_bench/ contains benchmark resources for evaluating reward models on poster assessment tasks.

Data Preparation

The benchmark images are hosted on Hugging Face. Download and extract them before running the evaluation:

cd poster_reward_bench

# Download from Hugging Face (requires huggingface-cli)
huggingface-cli download MeiGen-AI/PosterReward_v1 PRB_basic_images.tar.gz --repo-type model --local-dir .
huggingface-cli download MeiGen-AI/PosterReward_v1 PRB_advanced_images.tar.gz --repo-type model --local-dir .

# Extract
tar -xzf PRB_basic_images.tar.gz
tar -xzf PRB_advanced_images.tar.gz

After extraction, the directory structure should look like:

poster_reward_bench/
β”œβ”€β”€ PRB_basic_relative.json
β”œβ”€β”€ PRB_advanced_relative.json
β”œβ”€β”€ PRB_basic_images/          # 1,034 images
β”œβ”€β”€ PRB_advanced_images/       # 2,446 images
└── ...

PosterRewardBench has two subsets:

  • PosterRewardBench-Basic: generated by Flux, Flux-Krea, and SD3.5-L, with larger quality variation.
  • PosterRewardBench-Advanced: generated by Seedream-3.0, Seedream-4.0, and Qwen-Image-Lightning, with higher overall quality and smaller quality gaps.

All preference pairs were reviewed by four professional annotators, and only pairs with agreement from at least three annotators were retained.

For pointwise models, we report accuracy (↑) on MMRB2, HPDv3, PRB-Basic, and PRB-Ad, where:

  • PRB-Basic = PosterRewardBench-Basic
  • PRB-Ad = PosterRewardBench-Advanced

Evaluation Pipeline

Evaluating the full PosterReward model on PosterRewardBench is a two-step process:

Step 1: Deploy the Analyser and generate analyses

cd poster_reward_bench

# Deploy the PosterReward Analyser via vLLM (edit MODEL_PATH in vllm_deploy.sh first)
bash vllm_deploy.sh

# In another terminal, run the analysis generation script
python step1_gen_analysis.py

This generates PRB_basic_relative_with_analysis.json and PRB_advanced_relative_with_analysis.json. The script supports checkpointing and can resume from interruptions.

Step 2: Score and evaluate accuracy

# Edit MODEL_PATH in batch_eval.sh, then run:
bash batch_eval.sh

Note: Ensure the correct swift binary is in your PATH (from the project's conda environment), as step2_eval.py invokes swift infer via subprocess.


πŸ“Š Main Results

🎯 Pointwise Reward Models

Legend: πŸ₯‡ best, πŸ₯ˆ second best within each metric column.

Model MMRB2 ↑ HPDv3 ↑ PRB-Basic ↑ PRB-Ad ↑
ImageReward53.058.660.749.3
PickScore57.665.666.744.1
HPSv255.065.370.843.7
UnifiedReward*56.959.460.052.7
HPSv358.576.972.941.2
PosterReward-Lite 60.5 πŸ₯‡ 77.1 πŸ₯ˆ 83.9 πŸ₯ˆ 85.0 πŸ₯ˆ
PosterReward 59.6 πŸ₯ˆ 77.8 πŸ₯‡ 86.7 πŸ₯‡ 86.0 πŸ₯‡

πŸ–ΌοΈ PosterBench Results

Legend: πŸ₯‡ best, πŸ₯ˆ second best. For Std-Avg, lower is better.

Closed-Source Models

Model Mean ↑ Median ↑ Std-Avg ↓ Bo8-Avg ↑
Nano-Banana-Pro13.36 πŸ₯‡13.47 πŸ₯‡1.91 πŸ₯ˆ15.77 πŸ₯‡
Seedream-4.5*12.03 πŸ₯ˆ12.09 πŸ₯ˆ2.0814.57 πŸ₯ˆ
Nano-Banana11.6011.692.1714.49
Seedream-4.011.4611.442.0613.93
GPT-Image-111.1611.381.75 πŸ₯‡13.43
Seedream-3.05.015.133.669.75

Open-Source Models

Model Mean ↑ Median ↑ Std-Avg ↓ Bo8-Avg ↑
Qwen-Image-251211.86 πŸ₯‡11.63 πŸ₯‡1.46 πŸ₯‡13.85 πŸ₯‡
Qwen-Image7.69 πŸ₯ˆ7.72 πŸ₯ˆ2.5511.06 πŸ₯ˆ
Z-Image-Turbo7.657.312.18 πŸ₯ˆ10.47
Flux.2-klein-9B7.387.663.2011.67
Flux.1-krea-dev5.005.143.599.58
Flux.1-dev2.552.423.857.81
SD3.5-L-2.90-3.922.681.24

πŸ’Ύ Model Zoo

Model Type Link
PosterReward_Analyser Generative VLM for multi-dimensional analysis πŸ€— Hugging Face
PosterReward-Pairwise Generative pairwise judge
PosterReward_Scorer Scalar reward model for scoring πŸ€— Hugging Face
PosterReward-Lite Fast pointwise scorer πŸ€— Hugging Face

πŸ™ Acknowledgments

  • Thanks to our collaborators and affiliated institutions.
  • Thanks to the open-source community and prior reward modeling research.

πŸ“ Citation

Comming Soon!

About

[CVPR2026] PosterReward: Unlocking Accurate Evaluation for High-Quality Graphic Design Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages