Entropy and Perplexity
Entropy
Entropy is a measure of the uncertainty, randomness, or information content of a random variable or a probability distribution. The entropy of a random variable is defined as:
is the probability distribution of . The self-information of is defined as , which measures how much information is gained when occurs. The negative sign indicates that as the occurrence of increases, its self-information value decreases.
Entropy has several properties, including:
It is non-negative: .
It is at its minimum when is entirely predictable (all probability mass on a single outcome).
It is at its maximum when all outcomes of are equally likely.
Q10: Why is logarithmic scale used to measure self-information in entropy calculations?
Sequence Entropy
Sequence entropy is a measure of the unpredictability or information content of the sequence, which quantifies how uncertain or random a word sequence is.
Assume a long sequence of words, , concatenating the entire text from a language . Let be a set of all possible sequences derived from , where is the shortest sequence (a single word) and is the longest sequence. Then, the entropy of can be measured as follows:
The entropy rate (per-word entropy), , can be measured by dividing by the total number of words :
In theory, there is an infinite number of unobserved word sequences in the language . To estimate the true entropy of , we need to take the limit to as approaches infinity:
By applying this theorem, can be approximated:
Consequently, is approximated as follows, where :
Q11: What indicates high entropy in a text corpus?
Perplexity
Perplexity measures how well a language model can predict a set of words based on the likelihood of those words occurring in a given text. The perplexity of a word sequence is measured as:
Hence, the higher is, the lower its perplexity becomes, implying that the language model is "less perplexed" and more confident in generating .
Perplexity, , can be directly derived from the approximated entropy rate, :
Q12: What is the relationship between corpus entropy and language model perplexity?
References
Last updated
Was this helpful?