ComfyUI/comfy/ldm/flux/math.py at d202c2ba7404affd58a2199aeb514b3cc48e0ef3

Files

rattus128 653ceab414 Reduce Peak WAN inference VRAM usage - part II (#10062 )

* flux: math: Use _addcmul to avoid expensive VRAM intermediate

The rope process can be the VRAM peak and this intermediate
for the addition result before releasing the original can OOM.
addcmul_ it.

* wan: Delete the self attention before cross attention

This saves VRAM when the cross attention and FFN are in play as the
VRAM peak.

2025-09-27 18:14:16 -04:00

1.9 KiB

Raw Blame History

View Raw

1.9 KiB Raw Blame History

1.9 KiB

Raw Blame History