分类: ICLR

1 篇文章

thumbnail
【鉴赏】On-Policy Distillation
标题: On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes[1] FROM ICLR 2024 Google DeepMind arXiv 通用的 KD(Knowledge Distillation) 方法存在教师模型输出和学生模型输出分布…