comparemela.com

Discussions:
Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments)


Translations: Simplified Chinese, French, Korean, Russian, Turkish






This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. The GPT-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the decoder-only transformer. The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset. In this post, we’ll look at the architecture that enabled the model to produce its results. We will go into the depths of its self-attention layer. And then we’ll look at applications for the decoder-only transformer beyond language modeling.

My goal here is to also supplement my earlier post, The Illustrated Transformer, with more visuals explaining the inner-workings of transformers, and how they’ve evolved since the original paper. My hope is that this visual language will hopefully make it easier to explain later Transformer-based models as their inner-workings continue to evolve.



Related Keywords

Russia ,China ,Turkey ,France ,Chinese ,French ,Russian ,Turkish ,Mohammad Saleh ,Ryan Sepassi ,Lukasz Kaiser ,Peterj Liu ,Neural Network ,Hacker News ,Simplified Chinese ,Illustrated Transformer ,Brain Surgery ,Looking Inside ,Language Modeling ,Illustrated Word ,Generating Wikipedia ,Summarizing Long Sequences ,Character Level Language Modeling ,Deeper Self Attention ,First Law ,Byte Pair Encoding ,Illustrated Self Attention ,Processing One Token ,Connected Neural Network ,Beyond Language Modeling ,Sample Efficient Text Summarization Using ,Single Pre Trained Transformer ,Music Transformer ,Hugging Face ,

© 2024 Vimarsana

comparemela.com © 2020. All Rights Reserved.