Jeremy Howard @jeremyphoward, Twitter Profile

Jeremy Howard @jeremyphoward

3 years ago

Is there anything that shows what different layers of an NLP model learns, as well as Zeiler and Fergus showed what different layers of an image model learns? arxiv.org/abs/1311.2901

20 29 275 0 113

Download Image

Jeremy Howard @jeremyphoward

3 years ago

I'm looking for something to help my students understand model structure and fine-tuning for pretrained NLP models.

5 1 33 0 2

Lucas Beyer (bl16) @giffmana

3 years ago

@jeremyphoward For LSTMs I really liked visualisations in Karpathy's "unreasonable effectiveness" blogpost and paper: karpathy.github.io/2015/05/21/rnn…

2 7 72 0 15

Download Image

Tu Vu @tuvllms

3 years ago

@jeremyphoward Could be relevant (off the top of my head): Liu et al., 2019 (arxiv.org/pdf/1903.08855…), Peters et al., 2018 (arxiv.org/pdf/1808.08949…), Voita et al., 2019 (arxiv.org/pdf/1909.01380…).

2 0 15 0 4

Stella Biderman @BlancheMinerva

3 years ago

@jeremyphoward I’m currently preparing a paper on this, I would be happy to find some time to talk to you about it this week or next.

0 0 11 0 1

Leon Derczynski ✍🏻 🌞🏠🌲 @LeonDerczynski

3 years ago

@jeremyphoward great post about an emnlp paper, by elena voita: "Evolution of Representations in the Transformer" lena-voita.github.io/posts/emnlp19_…

1 0 8 0 3

Brian Christian @brianchristian

3 years ago

@jeremyphoward What came to mind was this by @ch402, @catherineols, et al.: transformer-circuits.pub/2021/framework…

2 0 7 0 0

Boaz Barak @boazbaraktcs

3 years ago

@jeremyphoward There was the work on CLIP of @ch402 and collaborators distill.pub/2021/multimoda…

0 0 4 0 0

Lior Alexander @LiorOnAI

3 years ago

@jeremyphoward Just found the implementation on Github: github.com/tetrachrome/su…

0 0 1 0 0

Vishnu - Jarvislabs.ai @vishnuvig

3 years ago

@jeremyphoward I think this could be useful github.com/jessevig/bertv…. It is referred in this book amazon.com/Natural-Langua…

0 3 9 0 2

Theodore Galanos @TheodoreGalanos

3 years ago

@jeremyphoward The logit lens work from nostalgebraist is really nice: lesswrong.com/posts/AcKRB8wD… Work by Anthropic on induction heads was very interesting (transformer-circuits.pub/2022/in-contex…), a piece of software was open-sourced (github.com/anthropics/PyS…) although it's not supported anymore.

1 0 10 0 2

Tomas Transförmer @JaloppySloppy

3 years ago

@jeremyphoward This is an interesting one by Chen, Olshausen and LeCun arxiv.org/abs/2103.15949

0 0 3 0 0

Natalie Shapira @NatalieShapira

3 years ago

@jeremyphoward arxiv.org/abs/1809.08037 a work lead by @alon_jacovi

0 0 2 0 0

Betty van Aken @betty_v_a

3 years ago

@jeremyphoward Hi, we tried to visualise this for QA tasks in 2019. You can find the demo at: visbert.demo.datexis.com

0 0 1 0 0

Joseph Szymborski @JSzym

3 years ago

@jeremyphoward Visualizing attention heads for transformers have always been a helpful way for me to reason about the intuition behind attention-based networks. aclanthology.org/P19-3007/

0 0 1 0 0

D.P @vo_d_p

3 years ago

@jeremyphoward If any of recent works in AI can relate itself to (sub-)symbolic inference, that must be anatomy of a learned NLP model.

0 0 1 0 0

Gidi Shperber @shgidi_p

3 years ago

@jeremyphoward jalammar.github.io/hidden-states/ this might be helpful

0 0 0 0 0

imtiaz @imtiazkhan_ds

3 years ago

@jeremyphoward BertViz

0 0 0 0 0

Manas Ranjan Kar @manaskar

3 years ago

@jeremyphoward RNN based architectures (for language) and layered CNN ones (images) learn differently. Dont know if the learning can be human interpretable when broken down.

0 0 0 0 0