Apple discussed its large language model (LLM) Reference Resolution As Language Modeling (ReALM) and how it can “substantially outperform” OpenAIs GPT-4. Apple said that while LLMs are extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilised.
Forum discussion: Snark in title courtesy of and credit to TheRegister :D :D https://www.theregister.com/2024/03/11/authors file lawsuit to torpedo/quote:Nvidia is the latest tech giant to face allegations that it used copyrighted works
Authors file copyright lawsuit to torpedo Nvidia s NeMo • The Register theregister.com - get the latest breaking news, showbiz & celebrity photos, sport news & rumours, viral videos and top stories from theregister.com Daily Mail and Mail on Sunday newspapers.
Discussions:
Hacker News (64 points, 3 comments), Reddit r/MachineLearning (219 points, 18 comments)
Translations: Simplified Chinese, French, Korean, Russian, Turkish
This year, we saw a dazzling application of machine learning. The OpenAI GPT-2 exhibited impressive ability of writing coherent and passionate essays that exceed what we anticipated current language models are able to produce. The GPT-2 wasn’t a particularly novel architecture – it’s architecture is very similar to the decoder-only transformer. The GPT2 was, however, a very large, transformer-based language model trained on a massive dataset. In this post, we’ll look at the architecture that enabled the model to produce its results. We will go into the depths of its self-attention layer. And then we’ll look at applications for the decoder-only transformer beyond language modeling.
My goal here is to also supplement my earlier post, The Illustrated Transformer, with more visuals explaining the inner