comparemela.com

Why the Original Transformer Figure Is Wrong, and Some Other Interesting Historical Tidbits About LLMs

A few months ago, I shared the article, Understanding Large Language Models: A Cross-Section of the Most Relevant Literature To Get Up to Speed, and the positive feedback was very motivating! So, I also added a few papers here and there to keep the list fresh and relevant.

Related Keywords

Juergen Schmidhuber ,Substack Notesor ,Dynamic Recurrent Neural Networks ,Understanding Large Language Models ,Layer Normalization ,Transformer Architecture ,Attention Is All You Need ,Dual Residual Connections ,Control Fast Weight Memories ,Fast Weight Programmers ,Linear Transformers Are Secretly Fast Weight ,Universal Language Model Fine Tuning ,Text Classification ,Scaling Language Models ,Training Gopher ,Root Mean Square Normalization ,Both Layernorm ,

comparemela.com © 2020. All Rights Reserved.