CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation

Fangtai Wu1,2, Hailong Guo2, Shijie Huang2, Jiayi Song2,3, Yubo Huang2,
Mushui Liu1, Zhao Wang1, Yunlong Yu1*, Jiaming Liu2*†, Ruihua Huang2
1Zhejiang University   2Qwen Applications, Alibaba   3Xi'an Jiaotong University

* Corresponding authors |  Project lead

CollectionLoRA distills many effect LoRAs and few-step generation into a single LoRA.

We propose CollectionLoRA, a multi-teacher on-policy distillation framework that consolidates diverse effects and few-step inference capabilities into a single LoRA.

Abstract

Customized image editing equips pre-trained diffusion models with specific visual effects through Low-Rank Adaptation (LoRA). As the number of desired effects grows, storing and dynamically loading numerous effect LoRAs becomes prohibitively expensive at deployment, and cascading them with acceleration LoRAs causes severe parameter interference, leading to concept bleeding and style degradation.

We propose CollectionLoRA, a multi-teacher on-policy distillation framework that distills up to 50 different effect LoRAs together with few-step generation capabilities into a single LoRA, fundamentally resolving feature interference while significantly reducing deployment overhead. The method introduces (i) a Probabilistic Dual-Stream Routing mechanism that randomly switches between data sources to enhance generalization on unseen scenarios; (ii) an Asymmetric Orthogonal Prompting strategy that achieves concept isolation in the prompt space; and (iii) a Coarse-to-Fine Distillation Objective that mitigates the distribution gap between the teacher and student models. Extensive experiments show that CollectionLoRA matches or surpasses independently trained teachers in concept fidelity at a fraction of the deployment cost.

Motivation

Conventional multi-LoRA pipeline vs. CollectionLoRA.

Conventional pipeline (left): each effect is trained as a separate LoRA and cascaded with an acceleration LoRA at inference, incurring storage cost, routing latency, and parameter interference. CollectionLoRA (right): a single distilled LoRA absorbs all effects together with the acceleration prior, eliminating runtime routing and parameter conflicts.

Method

Overall framework of CollectionLoRA.

Overall framework. (a) PDSR routes each batch into the effect or general stream with probability $p_{\text{switch}}$. (b) The effect stream applies AOP and C2F-DO: trajectory anchoring stabilizes the cold start, while target / backward simulation restore detail and global distribution. (c) The general stream performs standard DMD on unlabeled images to prevent catastrophic forgetting.

Why Vanilla DMD Fails — and How C2F-DO Fixes It

Effectiveness of C2F-DO: from DMD collapse to detail recovery.

(a) Standard DMD collapses to an intermediate manifold under multi-teacher distillation. (b) Trajectory anchoring alone over-smooths textures; adding target simulation restores realistic high-frequency detail.

Quantitative Results

50 effects in one LoRA, 8 NFE, evaluated on EffectBench.

Setting Method CLIP ↑ DreamSim ↓ DINO ↑ VSA ↑ EditReward ↑ BCR ↓ NFE ↓
Single effect Base 0.7260.4340.6114.0751.0070.14140×2
Base + Lightning 0.7170.4410.6123.9010.9860.1688
50 effects in 1 FM + Lightning 0.7030.4680.6114.1500.9290.2178
Ours 0.7270.4250.6004.3801.0520.0878

With a single LoRA at 8 NFE, CollectionLoRA outperforms per-effect Base teachers on most metrics and substantially reduces Bad Case Ratio (BCR).

LoRA Scaling (CLIP ↑)

Method 10 LoRAs 20 LoRAs 50 LoRAs 100 LoRAs 180 LoRAs
Base 0.7350.7240.7260.7230.724
Base + Lightning 0.7160.7120.7170.7170.722
All-in-1 (FM) + Lightning 0.7250.7220.7030.6940.689
All-in-1 (Ours) 0.7410.7230.7270.7160.709

Our unified model scales to 180 effects while staying competitive with per-effect baselines, dramatically reducing storage and routing cost.

Deployment Cost

Metric Method 102050100150
Routing latency Baseline 6.88s6.95s7.09s7.22s9.18s
Ours 0s0s0s7.22s9.18s
Routing accuracy Baseline 99%94%87%85%76%
Ours 100%100%100%90%82%
Storage Baseline 2.2G×102.2G×202.2G×502.2G×1002.2G×150
Ours 2.2G2.2G2.2G2.2G×22.2G×3

Qualitative Comparison

Qualitative comparison of CollectionLoRA against baseline methods.

CollectionLoRA preserves texture detail, style purity, and OOD generalization where cascaded multi-LoRA pipelines exhibit concept bleeding and style drift.

Zero-Shot Effect Composition

Zero-shot effect composition with CollectionLoRA.

Any two trained effects can be combined at inference time in a single forward pass—no extra fine-tuning required.

Ablation Studies

Exp. PDSR AOP TS TA-FM CLIP ↑ DreamSim ↓ DINO ↑ VSA ↑ EditReward ↑ BCR ↓
(1) 0.7250.4340.5142.7560.9890.378
(2) 0.7320.4270.5253.7201.0080.207
(3) 0.7360.4200.5414.0180.9790.199
(4) 0.7270.4260.5904.2480.9760.108
(5) 0.7270.4250.6004.3801.0520.087

Component ablation under the 50-in-1 concurrent setting. ✓ indicates the module is enabled. Best values per metric are bold. The full configuration (5) achieves the best DINO, VSA, EditReward, and BCR.

Visual comparison of ablations across the proposed components.

Qualitative ablation. Removing PDSR, AOP, TS, or TA-FM leads to visible degradation in concept fidelity, texture detail, or style purity, while the full model preserves all of them.

Qualitative ablation: training dynamics with TA-FM and Target Simulation.

Training dynamics. Adding TA-FM and target simulation yields more stable convergence and higher fidelity than vanilla DMD.

DreamSim distance during training across ablations.

Style distance. DreamSim vs. optimization step.

CLIP score during training across ablations.

Alignment. CLIP score vs. optimization step.

More Qualitative Results

Additional qualitative panels covering 50 effects under the same student model. · hover to pause

BibTeX

@misc{wu2026collectionloracollecting50effects,
      title={CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation}, 
      author={Fangtai Wu and Hailong Guo and Shijie Huang and Jiayi Song and Yubo Huang and Mushui Liu and Zhao Wang and Yunlong Yu and Jiaming Liu and Ruihua Huang},
      year={2026},
      eprint={2605.25378},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.25378}, 
}