Announcement_3
New preprint out: Boomerang Distillation Enables Zero-Shot Model Size Interpolation. We show that given a teacher and a single distilled student model, you can create models of intermediate sizes without any additional training!
New preprint out: Boomerang Distillation Enables Zero-Shot Model Size Interpolation. We show that given a teacher and a single distilled student model, you can create models of intermediate sizes without any additional training!