arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

5.1. Language Models

hashtag
Statistical

  • N-gram Language Modelsarrow-up-right, Jurafsky and Martin, Chapter 3 in Speech and Language Processing (3rd ed.), 2023.

hashtag
Neural-based

  • , Mikolov et al., ICLR, 2013. <- Word2Vec

  • , Pennington et al., EMNLP, 2014.

  • , Ppeters et al., NAACL, 2018. <- ELMo

hashtag
Transformers

  • , Vaswani et al., NIPS, 2017. <- Transformer

  • , Liu et al., ICLR, 2018.

  • , Devlin et al., NAACL, 2018.

hashtag
Tokenization

  • , Sennrich et al., ACL, 2016. <- Byte-Pair Encoding (BPE)

  • , Wu et al., arXiv, 2016. <- WordPiece

  • , Kudo and Richardson, EMNLP, 2018.

hashtag
GPT (Generative Pre-trained Transformer)

  • , Radford et al., OpenAI, 2018. <- GPT-1

  • , Radford et al., OpenAI, 2019. <- GPT-2

  • , Brown et al., NeurIPS, 2020. <- GPT-3

Efficient Estimation of Word Representations in Vector Spacearrow-up-right
GloVe: Global Vectors for Word Representationarrow-up-right
Deep Contextualized Word Representationsarrow-up-right
Attention is All You Needarrow-up-right
Generating Wikipedia by Summarizing Long Sequencesarrow-up-right
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandingarrow-up-right
Neural Machine Translation of Rare Words with Subword Unitsarrow-up-right
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translationarrow-up-right
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processingarrow-up-right
Improving Language Understanding by Generative Pre-Trainingarrow-up-right
Language Models are Unsupervised Multitask Learnersarrow-up-right
Language Models are Few-Shot Learnersarrow-up-right