【鉴赏】On-Policy Distillation 2025-10-06 9:36 | 9 | 0 | ICLR 681 字 | 4 分钟 标题: On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes[1] FROM ICLR 2024 Google DeepMind arXiv 通用的 KD(Knowledge Distillation) 方法存在教师模型输出和学生模型输出分布… DistillationLLM