← back
Distilling Foundation Models via Energy Hessians
Ishan Amin (et. al) · Paper · June 11, 2026
- We don’t actually need the full Hessian for distillation; randomly selected rows work? This is really useful if true
- Picking the rows smartly could do something here (e.g. normal modes?)
- Numerical Hessians seem to be good, too
- Don’t have to have source code to run
autograd myself! Nice for heavily wrapped models