There are two steps in ourframework:pre-trainingandfine-tuning. Dur-ing pre-training, the model is trained on unlabeleddata over different pre-training tasks. For fine-tuning, the BERT model is first initialized withthe pre-trained parameters, and all of the param-eters are fine-tuned using labeled data from thedownstream tasks.
.