5.1. Language Models
Statistical
N-gram Language Models, Jurafsky and Martin, Chapter 3 in Speech and Language Processing (3rd ed.), 2023.
Neural-based
Efficient Estimation of Word Representations in Vector Space, Mikolov et al., ICLR, 2013. <- Word2Vec
GloVe: Global Vectors for Word Representation, Pennington et al., EMNLP, 2014.
Deep Contextualized Word Representations, Ppeters et al., NAACL, 2018. <- ELMo
Transformers
Attention is All You Need, Vaswani et al., NIPS, 2017. <- Transformer
Generating Wikipedia by Summarizing Long Sequences, Liu et al., ICLR, 2018.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL, 2018.
Tokenization
Neural Machine Translation of Rare Words with Subword Units, Sennrich et al., ACL, 2016. <- Byte-Pair Encoding (BPE)
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Wu et al., arXiv, 2016. <- WordPiece
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing, Kudo and Richardson, EMNLP, 2018.
GPT (Generative Pre-trained Transformer)
Improving Language Understanding by Generative Pre-Training, Radford et al., OpenAI, 2018. <- GPT-1
Language Models are Unsupervised Multitask Learners, Radford et al., OpenAI, 2019. <- GPT-2
Language Models are Few-Shot Learners, Brown et al., NeurIPS, 2020. <- GPT-3
Last updated