this is hyperparameters not data (e.g. batch size, learning rate, ..) you actually can do reverse (little -> big) distillation pretty easily. e.g. rephrasing generally doesn’t have to be a big model
this is hyperparameters not data (e.g. batch size, learning rate, ..) you actually can do reverse (little -> big) distillation pretty easily. e.g. rephrasing generally doesn’t have to be a big model
No replies