We note that improved reconstruction may come at the cost of increased feature absorption (Karvonen et al., 2024)
Clearly from the nice agreement in Fig. 5, the SAE reconstructions do an excellent job at reconstructing the residual representation at each layer. I am curious about the magnitude of the reconstruction MSE for the hyperparameters covered in Fig. 8. Are there any results you've shared about the SAE training?
There is a tradeoff between reconstruction error and L0 sparsity, but at what point are you learning more about only the SAEs than ESM2 itself?