Is there anything that shows what different layers of an NLP model learns, as well as Zeiler and Fergus showed what different layers of an image model learns? arxiv.org/abs/1311.2901
20
29
275
0
113
Download Image
I'm looking for something to help my students understand model structure and fine-tuning for pretrained NLP models.
@jeremyphoward Maybe some of the papers in "BERTology" may help? huggingface.co/docs/transform… aclanthology.org/2020.tacl-1.54/
@jeremyphoward Something like that slide at 18mn44, which is really telling (shows the t-sne projection of the last embeddings layer before and after fine tuning) youtu.be/3kmfiupSyPQ
@jeremyphoward github.com/jessevig/bertv… might be helpful
@jeremyphoward If I'm not mistakem @letiepi had done some work on this!