feature: prompt embed caching by JoeGaffney · Pull Request #71 · JoeGaffney/deferred-diffusion

JoeGaffney · 2026-01-25T20:46:22Z

No description provided.

…a monky patch fixing another bug

Copilot

Pull request overview

This PR introduces a prompt caching system to optimize text encoding performance across diffusion pipelines, along with several pipeline refactorings and device mapping improvements. The title "2026 01 23" appears to be a date-based identifier.

Changes:

Added a new global prompt caching system that stores encoded prompt embeddings on CPU to reduce redundant text encoding operations
Refactored SD XL pipeline initialization to use specific pipeline classes instead of AutoPipeline wrappers
Added device_map_cpu parameter support for quantized model loading to control initial device placement
Integrated prompt caching into the pipeline optimization workflow with an opt-out mechanism

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
workers/common/prompt_caching.py	New module implementing LRU cache for prompt embeddings with CPU offloading and monkey-patching of pipeline encode_prompt methods
workers/common/pipeline_helpers.py	Integrated prompt caching into optimize_pipeline function and added device_map_cpu parameter to get_quantized_model
workers/common/text_encoders.py	Added device_map_cpu parameter to Mistral3 text encoder loading
workers/images/local/sd_xl.py	Replaced AutoPipeline wrappers with direct pipeline instantiation and added dedicated get_pipeline_image_to_image function
workers/images/local/flux_1.py	Simplified text_to_image_call by removing unnecessary AutoPipelineForText2Image wrapper
workers/images/local/flux_2.py	Added device_map_cpu parameter to Flux2 transformer loading
workers/images/local/qwen_image.py	Explicitly disabled prompt caching for Qwen pipelines

workers/common/prompt_caching.py

Copilot · 2026-01-25T20:50:45Z

workers/images/local/sd_xl.py

-    args = {}
-    args["variant"] = "fp16"
+def get_pipeline_image_to_image(model_id) -> StableDiffusionXLImg2ImgPipeline:
+    pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True)


The new get_pipeline_image_to_image function is missing the variant="fp16" parameter that is present in the get_inpainting_pipeline function. This inconsistency may cause the img2img pipeline to use a different weight variant than intended. Consider adding variant="fp16" to maintain consistency, or document why this parameter is only needed for the inpainting pipeline.

Suggested change

pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True)

pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(

model_id,

torch_dtype=torch.bfloat16,

use_safetensors=True,

variant="fp16",

)

workers/common/prompt_caching.py

workers/images/local/sd_xl.py

workers/images/local/qwen_image.py

workers/common/prompt_caching.py

JoeGaffney added 4 commits January 24, 2026 22:29

wip: adding prompt cacheing

d655381

fix: removed the from pipe and qwen cant cache currently as there is …

0ac7890

…a monky patch fixing another bug

fix: allow loading on cpu

6143569

feature: storing prompt on cpu

e36a6ec

Copilot AI review requested due to automatic review settings January 25, 2026 20:46

Copilot started reviewing on behalf of JoeGaffney January 25, 2026 20:46 View session

JoeGaffney changed the title ~~2026 01 23~~ feature: prompt embed caching Jan 25, 2026

Copilot AI reviewed Jan 25, 2026

View reviewed changes

fix: mypy issues

303ccbf

JoeGaffney merged commit 76fd8dc into main Jan 26, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: prompt embed caching#71

feature: prompt embed caching#71
JoeGaffney merged 5 commits intomainfrom
2026-01-23

JoeGaffney commented Jan 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoeGaffney commented Jan 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants