Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a prompt caching system to optimize text encoding performance across diffusion pipelines, along with several pipeline refactorings and device mapping improvements. The title "2026 01 23" appears to be a date-based identifier.
Changes:
- Added a new global prompt caching system that stores encoded prompt embeddings on CPU to reduce redundant text encoding operations
- Refactored SD XL pipeline initialization to use specific pipeline classes instead of AutoPipeline wrappers
- Added
device_map_cpuparameter support for quantized model loading to control initial device placement - Integrated prompt caching into the pipeline optimization workflow with an opt-out mechanism
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| workers/common/prompt_caching.py | New module implementing LRU cache for prompt embeddings with CPU offloading and monkey-patching of pipeline encode_prompt methods |
| workers/common/pipeline_helpers.py | Integrated prompt caching into optimize_pipeline function and added device_map_cpu parameter to get_quantized_model |
| workers/common/text_encoders.py | Added device_map_cpu parameter to Mistral3 text encoder loading |
| workers/images/local/sd_xl.py | Replaced AutoPipeline wrappers with direct pipeline instantiation and added dedicated get_pipeline_image_to_image function |
| workers/images/local/flux_1.py | Simplified text_to_image_call by removing unnecessary AutoPipelineForText2Image wrapper |
| workers/images/local/flux_2.py | Added device_map_cpu parameter to Flux2 transformer loading |
| workers/images/local/qwen_image.py | Explicitly disabled prompt caching for Qwen pipelines |
| args = {} | ||
| args["variant"] = "fp16" | ||
| def get_pipeline_image_to_image(model_id) -> StableDiffusionXLImg2ImgPipeline: | ||
| pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True) |
There was a problem hiding this comment.
The new get_pipeline_image_to_image function is missing the variant="fp16" parameter that is present in the get_inpainting_pipeline function. This inconsistency may cause the img2img pipeline to use a different weight variant than intended. Consider adding variant="fp16" to maintain consistency, or document why this parameter is only needed for the inpainting pipeline.
| pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16, use_safetensors=True) | |
| pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| use_safetensors=True, | |
| variant="fp16", | |
| ) |
No description provided.