Author Interview - Typical Decoding for Natural Language Generation

Yannic Kilcher Videos (Audio Only) - En podkast av Yannic Kilcher

Kategorier:

#deeplearning #nlp #sampling This is an interview with first author Clara Meister. Paper review video hereé https://youtu.be/_EDr3ryrT_Y Modern language models like T5 or GPT-3 achieve remarkably low perplexities on both training and validation data, yet when sampling from their output distributions, the generated text often seems dull and uninteresting. Various workarounds have been proposed, such as top-k sampling and nucleus sampling, but while these manage to somewhat improve the generated samples, they are hacky and unfounded. This paper introduces typical sampling, a new decoding method that is principled, effective, and can be implemented efficiently. Typical sampling turns away from sampling purely based on likelihood and explicitly finds a trade-off between generating high-probability samples and generating high-information samples. The paper connects typical sampling to psycholinguistic theories on human speech generation, and shows experimentally that typical sampling achieves much more diverse and interesting results than any of the current methods. Sponsor: Introduction to Graph Neural Networks Course https://www.graphneuralnets.com/p/int... OUTLINE: 0:00 - Intro 0:35 - Sponsor: Introduction to GNNs Course (link in description) 1:30 - Why does sampling matter? 5:40 - What is a "typical" message? 8:35 - How do humans communicate? 10:25 - Why don't we just sample from the model's distribution? 15:30 - What happens if we condition on the information to transmit? 17:35 - Does typical sampling really represent human outputs? 20:55 - What do the plots mean? 31:00 - Diving into the experimental results 39:15 - Are our training objectives wrong? 41:30 - Comparing typical sampling to top-k and nucleus sampling 44:50 - Explaining arbitrary engineering choices 47:20 - How can people get started with this? Paper: https://arxiv.org/abs/2202.00666 Code: https://github.com/cimeister/typical-... Abstract: Despite achieving incredibly low perplexities on myriad natural language corpora, today's language models still often underperform when used to generate text. This dichotomy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language as a communication channel (à la Shannon, 1948) can provide new insights into the behaviors of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, and do so in a simultaneously efficient and error-minimizing manner; they choose each word in a string with this (perhaps subconscious) goal in mind. We propose that generation from probabilistic models should mimic this behavior. Rather than always choosing words from the high-probability region of the distribution--which have a low Shannon information content--we sample from the set of words with information content close to the conditional entropy of our model, i.e., close to the expected information content. This decision criterion can be realized through a simple and efficient implementation, which we call typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, typical sampling offers competitive performance in terms of quality while consistently reducing the number of degenerate repetitions. Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF

Visit the podcast's native language site