Skip to content

feature: prompt embed caching#71

Merged
JoeGaffney merged 5 commits intomainfrom
2026-01-23
Jan 26, 2026
Merged

feature: prompt embed caching#71
JoeGaffney merged 5 commits intomainfrom
2026-01-23

Conversation

@JoeGaffney
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings January 25, 2026 20:46
@JoeGaffney JoeGaffney changed the title 2026 01 23 feature: prompt embed caching Jan 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a prompt caching system to optimize text encoding performance across diffusion pipelines, along with several pipeline refactorings and device mapping improvements. The title "2026 01 23" appears to be a date-based identifier.

Changes:

  • Added a new global prompt caching system that stores encoded prompt embeddings on CPU to reduce redundant text encoding operations
  • Refactored SD XL pipeline initialization to use specific pipeline classes instead of AutoPipeline wrappers
  • Added device_map_cpu parameter support for quantized model loading to control initial device placement
  • Integrated prompt caching into the pipeline optimization workflow with an opt-out mechanism

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
workers/common/prompt_caching.py New module implementing LRU cache for prompt embeddings with CPU offloading and monkey-patching of pipeline encode_prompt methods
workers/common/pipeline_helpers.py Integrated prompt caching into optimize_pipeline function and added device_map_cpu parameter to get_quantized_model
workers/common/text_encoders.py Added device_map_cpu parameter to Mistral3 text encoder loading
workers/images/local/sd_xl.py Replaced AutoPipeline wrappers with direct pipeline instantiation and added dedicated get_pipeline_image_to_image function
workers/images/local/flux_1.py Simplified text_to_image_call by removing unnecessary AutoPipelineForText2Image wrapper
workers/images/local/flux_2.py Added device_map_cpu parameter to Flux2 transformer loading
workers/images/local/qwen_image.py Explicitly disabled prompt caching for Qwen pipelines

args = {}
args["variant"] = "fp16"
def get_pipeline_image_to_image(model_id) -> StableDiffusionXLImg2ImgPipeline:
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True)
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new get_pipeline_image_to_image function is missing the variant="fp16" parameter that is present in the get_inpainting_pipeline function. This inconsistency may cause the img2img pipeline to use a different weight variant than intended. Consider adding variant="fp16" to maintain consistency, or document why this parameter is only needed for the inpainting pipeline.

Suggested change
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True)
pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
use_safetensors=True,
variant="fp16",
)

Copilot uses AI. Check for mistakes.
@JoeGaffney JoeGaffney merged commit 76fd8dc into main Jan 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants