Commit Graph

2027 Commits

Author SHA1 Message Date
krigeta dc9822b7df Add working Qwen 2512 ControlNet (Fun ControlNet) support (#12359) 2026-02-13 22:23:52 -05:00
comfyanonymous 712efb466b Add left padding to LTXAV text encoder. (#12456) 2026-02-13 21:56:54 -05:00
comfyanonymous 726af73867 Fix some custom nodes. (#12455) 2026-02-13 20:21:10 -05:00
comfyanonymous 831351a29e Support generating attention masks for left padded text encoders. (#12454) 2026-02-13 20:15:23 -05:00
comfyanonymous e1add563f9 Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
rattus 8902907d7a dynamic_vram: Training fixes (#12442) 2026-02-13 15:29:37 -05:00
rattus ae79e33345 llama: use a more efficient rope implementation (#12434)
Get rid of the cat and unary negation and inplace add-cmul the two
halves of the rope. Precompute -sin once at the start of the model
rather than every transformer block.

This is slightly faster on both GPU and CPU bound setups.
2026-02-12 19:56:42 -05:00
rattus 117e214354 ModelPatcherDynamic: force load non leaf weights (#12433)
The current behaviour of the default ModelPatcher is to .to a model
only if its fully loaded, which is how random non-leaf weights get
loaded in non-LowVRAM conditions.

The however means they never get loaded in dynamic_vram. In the
dynamic_vram case, force load them to the GPU.
2026-02-12 19:51:50 -05:00
askmyteapot e5ae670a40 Update ace15.py to allow min_p sampling (#12373) 2026-02-11 20:28:48 -05:00
rattus 3fe61cedda model_patcher: guard against none model_dtype (#12410)
Handle the case where the _model_dtype exists but is none with the
intended fallback.
2026-02-11 14:54:02 -05:00
rattus 2a4328d639 ace15: Use dynamic_vram friendly trange (#12409)
Factor out the ksampler trange and use it in ACE LLM to prevent the
silent stall at 0 and rate distortion due to first-step model load.
2026-02-11 14:53:42 -05:00
rattus d297a749a2 dynamic_vram: Fix windows Aimdo crash + Fix LLM performance (#12408)
* model_management: lazy-cache aimdo_tensor

These tensors cosntructed from aimdo-allocations are CPU expensive to
make on the pytorch side. Add a cache version that will be valid with
signature match to fast path past whatever torch is doing.

* dynamic_vram: Minimize fast path CPU work

Move as much as possible inside the not resident if block and cache
the formed weight and bias rather than the flat intermediates. In
extreme layer weight rates this adds up.
2026-02-11 14:50:16 -05:00
comfyanonymous 76a7fa96db Make built in lora training work on anima. (#12402) 2026-02-10 22:04:32 -05:00
Kohaku-Blueleaf cdcf4119b3 [Trainer] training with proper offloading (#12189)
* Fix bypass dtype/device moving

* Force offloading mode for training

* training context var

* offloading implementation in training node

* fix wrong input type

* Support bypass load lora model, correct adapter/offloading handling
2026-02-10 21:45:19 -05:00
rattus 123a7874a9 ops: Fix vanilla-fp8 loaded lora quality (#12390)
This was missing the stochastic rounding required for fp8 downcast
to be consistent with model_patcher.patch_weight_to_device.

Missed in testing as I spend too much time with quantized tensors
and overlooked the simpler ones.
2026-02-10 13:38:28 -05:00
rattus f719f9c062 sd: delay VAE dtype archive until after override (#12388)
VAEs have host specific dtype logic that should override the dynamic
_model_dtype. Defer the archiving of model dtypes until after.
2026-02-10 13:37:46 -05:00
rattus fe053ba5eb mp: dont deep-clone objects from model_options (#12382)
If there are non-trivial python objects nested in the model_options, this
causes all sorts of issues. Traverse lists and dicts so clones can safely
overide settings and BYO objects but stop there on the deepclone.
2026-02-10 13:37:17 -05:00
comfyanonymous a4be04c5d7 Ace step prompts match now. (#12376) 2026-02-09 19:45:56 -05:00
blepping baf8c87455 Iimprovements to ACE-Steps 1.5 text encoding (part 2) (#12350) 2026-02-09 19:41:49 -05:00
rattus 62315fbb15 Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak (#12368)
* revert threaded model loader change

This change was only needed to get around the pytorch 2.7 mempool bugs,
and should have been reverted along with #12260. This fixes a different
memory leak where pytorch gets confused about cache emptying.

* load non comfy weights

* MPDynamic: Pre-generate the tensors for vbars

Apparently this is an expensive operation that slows down things.

* bump to aimdo 1.8

New features:
watermark limit feature
logging enhancements
-O2 build on linux
2026-02-09 16:16:08 -05:00
comfyanonymous f350a84261 Disable prompt weights for ltxv2. (#12354) 2026-02-07 19:16:28 -05:00
comfyanonymous 17e7df43d1 Pad ace step 1.5 ref audio if not long enough. (#12341) 2026-02-07 00:02:11 -05:00
comfyanonymous 039955c527 Some fixes to previous pr. (#12339) 2026-02-06 20:14:52 -05:00
tdrussell 6a26328842 Support fp16 for Cosmos-Predict2 and Anima (#12249) 2026-02-06 20:12:15 -05:00
comfyanonymous 204e65b8dc Fix bug with last pr (#12338) 2026-02-06 19:48:20 -05:00
asagi4 a831c19b70 Fix return_word_ids=True with Anima tokenizer (#12328) 2026-02-06 19:38:04 -05:00
comfyanonymous eba6c940fd Make ace step 1.5 base model work properly with default workflow. (#12337) 2026-02-06 19:14:56 -05:00
comfyanonymous c2d7f07dbf Fix issue when using disable_unet_model_creation (#12315) 2026-02-05 19:24:09 -05:00
comfyanonymous 458292fef0 Fix some lowvram stuff with ace step 1.5 (#12312) 2026-02-05 19:15:04 -05:00
comfyanonymous 6555dc65b8 Make ace step 1.5 work without the llm. (#12311) 2026-02-05 16:43:45 -05:00
comfyanonymous 35183543e0 Add VAE tiled decode node for audio. (#12299) 2026-02-05 01:12:04 -05:00
blepping a246cc02b2 Improvements to ACE-Steps 1.5 text encoding (#12283) 2026-02-05 00:17:37 -05:00
comfyanonymous a50c32d63f Disable sage attention on ace step 1.5 (#12297) 2026-02-04 22:15:30 -05:00
comfyanonymous 6125b80979 Add llm sampling options and make reference audio work on ace step 1.5 (#12295) 2026-02-04 21:29:22 -05:00
comfyanonymous c8fcbd66ee Try to fix ace text encoder slowness on some configs. (#12290) 2026-02-04 19:37:05 -05:00
comfyanonymous 26dd7eb421 Fix ace step nan issue on some hardware/pytorch configs. (#12289) 2026-02-04 18:25:06 -05:00
rattus ef73070ea4 mp: Fix checkpoint saving (#12268)
Fix regression in the recent model saving refactor. Pass the non unet
pieces down the layers so that checkpoints are complete.
2026-02-04 02:08:45 -05:00
rattus d30c609f5a utils: safetensors: dont slice data on torch level (#12266)
Torch has alignment enforcement when viewing with data type changes
but only relative to itself. Do all tensor constructions straight
off the memory-view individually so pytorch doesnt see an alignment
problem.

The is needed for handling misaligned safetensors weights, which are
reasonably common in third party models.

This limits usage of this safetensors loader to GPU compute only
as CPUs kernnel are very likely to bus error. But it works for
dynamic_vram, where we really dont want to take a deep copy and we
always use GPU copy_ which disentangles the misalignment.
2026-02-04 01:48:47 -05:00
comfyanonymous a31681564d Fix crash with ace step 1.5 (#12264) 2026-02-04 00:03:21 -05:00
rattus 855849c658 mm: Remove Aimdo exemption for empty_cache (#12260)
Its more important to get the torch caching allocator GC up and running
than supporting the pyt2.7 bug. Switch it on.

Defeature dynamic_vram + pyt2.7.
2026-02-03 21:39:19 -05:00
comfyanonymous fe2511468d Support the 4B ace step 1.5 lm model. (#12257)
Can be used as an alternative to the 1.7B
2026-02-03 19:01:38 -05:00
comfyanonymous b8315e66cb Fix tiled vae for ace step 1.5 (#12253) 2026-02-03 14:40:45 -05:00
comfyanonymous ab1050bec3 Support ace step 1.5 base model loras. (#12252) 2026-02-03 13:54:23 -05:00
comfyanonymous 85fc35e8fa Fix mac issue. (#12250) 2026-02-03 12:19:39 -05:00
comfyanonymous 223364743c llama: cast logits as a comfy-weight (#12248)
This is using a different layers weight with .to(). Change it to use
the ops caster if the original layer is a comfy weight so that it picks
up dynamic_vram and async_offload functionality in full.

Co-authored-by: Rattus <rattus128@gmail.com>
2026-02-03 11:31:36 -05:00
comfyanonymous affe881354 Fix some issues with mac. (#12247) 2026-02-03 11:07:04 -05:00
comfyanonymous f5030e26fd Add progress bar to ace step. (#12242) 2026-02-03 04:09:30 -05:00
comfyanonymous 3c1a1a2df8 Basic support for the ace step 1.5 model. (#12237) 2026-02-03 00:06:18 -05:00
comfyanonymous c05a08ae66 Add back function. (#12234) 2026-02-02 19:52:07 -05:00
rattus de9ada6a41 Dynamic VRAM unloading fix (#12227)
* mp: fix full dynamic unloading

This was not unloading dynamic models when requesting a full unload via
the unpatch() code path.

This was ok, i your workflow was all dynamic models but fails with big
VRAM leaks if you need to fully unload something for a regular ModelPatcher

It also fices the "unload models" button.

* mm: load models outside of Aimdo Mempool

In dynamic_vram mode, escape the Aimdo mempool and load into the regular
mempool. Use a dummy thread to do it.
2026-02-02 17:35:20 -05:00