To tackle the search problem, two main challenges need to be ad-dressed. The first challenge is the huge search space with numerousplausible combinations. A model stacks many layers, each of whichcontains different numbers of parameters. Small changes to anyelement of the architecture may result in a new neural network thatcould produce largely different performance even when trainedon the same dataset. Model developers usually put laborious engi-neering effort into finding an appropriate architecture for the tinymodel, which is time-consuming and computing resource-hungry.The second challenge is that the objective of this search problem,i.e., the performance of the tiny model after distillation, is veryexpensive to compute. It is impractical and infeasible to train andevaluate each model we find in the searching process. Therefore,an easy-to-compute and effective predictive metric is desired to betailored for this difficult search problem
estimator可以参考的表述